Personal tools
You are here: Home openCMISS Wiki Configuring and Making Libmesh on Various Systems
Views

History for Configuring and Making Libmesh on Various Systems

changed:
-
General Notes:

   * 'libMesh' does not need to be built with 'PETSc' except you will not be able to run all
     examples (e.g., example 4 will fail)

   * 'libMesh' easily configures on standard linux distributions since standard Ubuntu linux has PETSc preinstalled.

   * 'libMesh' will not build using *shared libraries* on hpc and bioeng22 right out of the box.

       * Note that 'contrib/tetgen/Makefile' overrides CXXFLAGS and does not include -fPIC
         so the resulting tetgen objects cannot be included in the shared
         library on bioeng22 (amd64 systems).

       * Any of the following workarounds are available

	 * edit 'contrib/tetgen/Makefile'

	 * configure with --disable-tetgen

	 * configure with --disable-shared

       * Will sometimes only configure using --enable-shared=no for hpc.

         'CXX=xlC_r CC=xlC_r ./configure --disable-mpi' configures and 'make
	 CXX=xlC_r CXXFLAGS="-DNDEBUG -O2 -w -qansialias" CFLAGS="-DNDEBUG -O2
	 -w -qansialias"' builds but the resulting executable aborts at line
	 210 in include/base/variant_filter_iterator.h.

   * Configure a debugged version by setting METHOD=dbg  (Look in Make.common.in for options.)

bioeng22 Configuration:

   *  You must set P4_RSHCOMMAND=ssh using export (bash) or setenv (tcsh).  
        This allows us to run mpi jobs on bioeng22 using ssh instead of
        rsh (which has no running server) as the connector.  (The '-rsh
        rshcmd' option for 'mpirun' doesn't have any effect with
        mpich-bin 1.2.5.3-5 on Ubuntu amd64.)

   * ./configure --enable-shared=no --enable-mpi --enable-petsc --with-petsc=$PETSC_DIR --with-mpi=$MPI_DIR

       *Note:*  Using the above configuration a run using PETSc fails because it cannot recognize any MPI calls. (WHY?)

   * ./configure --enable-shared=no --enable-mpi --with-mpi=$MPI_DIR

       *Note:*  If I have set $PETSC_DIR and do not explicitly say --enable-petsc=no then it will assume that I want to use PETSc and then try to use PETSc's mpi libraries.  

   * ./configure --enable-shared=no --enable-mpi --with-mpi=$MPI_DIR  --enable-petsc=no

       *Note:*  This one WORKS.  I can run some simple examples and they do not fail when searching for MPI libraries.  I can also run ex9 with the above configuration (and then make) but not with (1) or (2)

   * ./configure --enable-tetgen=no  (RECOMMENDED CONFIGURE FOR SERIAL AND PARALLEL)

       *Note:*  This was suggested by Karl and requires setting the correct PETSc directories and PETSc arch.  He suggested 'setenv PETSC_DIR /usr/lib/petsc' and 'setenv PETSC_ARCH linux'.

       *Note:*  To run any examples remember to use 'mpirun -np 4(1,2,3) ex9' for 4(1,2,3) processors.

   * If you try to 'make METHOD=dbg' you will get a failure due to 'long
     long' in 'mpio.h' not being supported.  The problem is probably in
     mpich, but the error can be suppressed by adding the '-Wno-long-long'
     flag to CXXFLAGS in 'Make.common' or 'configure'.  Adding this flag
     wherever '-pedantic' appears ensures that make will work.  (Discussed
     on "liMmesh wiki":http://libmesh.sourceforge.net/wiki/index.php/Installation#.22long_long.22_Compilation_errors_with_MPICH)

     When using Ubuntu's libpetsc2.2.0-dbg 2.2.0-4, the build fails when
     linking the examples due to unresolved symbols 'MPE_Log_event' and
     others in '/usr/lib/petsc/lib/libg/linux/libpetscsnes.a' and
     '/usr/lib/petsc/lib/libg/linux/libpetsc.a'.

     With METHOD=opt (default), shared petsc libraries are used and these
     manage their own dependencies, but when using static libraries petsc's
     unusual build system provides the dependency information in
     '/usr/lib/petscdir/2.2.0/bmake/linux/packages' for makefiles.
     libmesh's 'Make.common.in' should probably be modified to use MPE_LIB
     (and maybe MPE_INCLUDE and PETSC_HAVE_MPE) from petsc, but the quickest
     hack to get things working is to provide the following argument to make
     when using METHOD=dbg::

       MPI_LIB='-L/usr/lib/mpich/lib/shared -L/usr/lib/mpich/lib -lmpich -lmpe -lpmpich -lslog'

     Note that this MPI_LIB definition must also be used when making any examples.

   * To compile with a version of PETSc (2.3.2-p7) that has HYPRE included in the libraries you must configure libMesh in the following way:

     * './configure --disable-tetgen --disable-shared --enable-petsc --with-petsc=$PETSC_DIR' where $PETSC_DIR has been set to be your new PETSc version.

     * See "this page":http://www.cmiss.org/openCMISS/wiki/ConfiguringAndMakingPETScOnVariousSystems for how to compile the PETSc/HYPRE combination properly.

hpc ppc-aix Configuration:

    * PETSc petsc-2.3.2-p6 built as in [Configuring and Making PETSc on Various Systems]

    * Set your PETSc environement variables i.e. 'PETSC_ARCH=aix5.1.0.0-64' and 'PETSC_DIR=/hpc/cmiss/petsc/petsc-2.3.2-p6'

	 * configure libMesh, for 32 bit './configure --disable-shared CXX=xlC_r CC=xlC_r LDFLAGS=-Wl,-bbigtoc --enable-petsc --enable-mpi', for 64 bit './configure --disable-shared CXX=xlC_r CXXFLAGS=-q64 CC=xlC_r CFLAGS=-q64 F77=xlf_r F77FLAGS=-q64 --enable-petsc --enable-mpi'

      -Wl,-bbigtoc isn't required for building the library but for making
		executables later.  The documentation says that it incurs a prohibitive
		runtime penalty so instead we should be trying to hide symbols.

    * error in detecting xlc, the existing test expects executing xlC to produce a manpage 
      which successfully greps for xlC, but on our system the included string is xlc, so I 
      changed the grep to 'xl[cC]'

    * Missing an include file.
		It seems to me that ppc-32-aix/include/mesh/mesh_refinement.h should
		#include <limits>

    * When building with MPI there are some macros that need to be defined
		  to get the IBM mpi.h to define SEEK_CUR etc.
      I added these defines to 'ppc-32-aix/include/base/libmesh_common.h'
		  There are symbols to define which are intended to create the 'SEEK_*'
		  symbols from the 'MPI_SEEK_*' symbols but they use enums which do not
		  seem ot work with the templates that required them::
		  
		     // On AIX with the parallel environment we need this macro to get SEEK_CUR
		     // defined which is used by using fstream/iostream.
		     #define _MPI_CPP_BINDINGS
		     #define _MPI_CPP_ALL_CONSTANTS
		  
      Instead I had to just define them, I need to make this depend on the
		  compiler::

		     // Include the MPI definition
		     #ifdef HAVE_MPI
		     #undef SEEK_SET
		     #undef SEEK_CUR
		     #undef SEEK_END
		     // On AIX with the parallel environment we need this macro to get SEEK_CUR
		     // defined which is used by using fstream/iostream.
		     #define _MPI_CPP_BINDINGS
		     #define _MPI_CPP_ALL_CONSTANTS
		     # include <mpi.h>
		     #define SEEK_SET MPI_SEEK_SET
		     #define SEEK_CUR MPI_SEEK_CUR
		     #define SEEK_END MPI_SEEK_END
		     #endif

    * The tetgen library defines REAL which conflicts with the AIX MPI, so
			 I undefined it, and defined another symbol assuming that REAL was
			 supposed to be double.  I then edited 'src/mesh/mesh_tetgen_support.C' as follows.  I could have just diabled the library I guess::

		     #include "mesh_tetgen_support.h"
		     #ifdef REAL
		     #  define TETGEN_REAL double
		     #  undef REAL
		     #endif
		     
		     //#include "mesh_data.h"
		     #include "cell_tet4.h"
		     #include "face_tri3.h"
		     #include "mesh.h"
		     #ifdef TETGEN_REAL
		     #  define REAL TETGEN_REAL
		     #endif

      This enabled me to get it built.  Then I need to run it.

	 * If I set MP_PROCS to the number of procs I want,
			 enable rsh with the insecure ~/.rhosts mechanism and
			 repeat our compute system enough times in a host.list file
			 then I can run multiprocessor, I have tried up to 16 processors.
			 However I don't want to encourage the use of rsh and we need to use
			 loadleveller to share the resources with other users.

    * We do not currently have any Loadleveller Pools defined so we can't use
			 that mechanism.

    * Instead to use LoadLeveller I created a script file like other
			 LoadLeveller instructions::
            
            #!/bin/ksh
            # @ job_type = parallel
            ## @ environment = COPY_ALL; \
            MP_EUILIB=ip; \
            MP_INFOLEVEL=2;
            ## @ network.mpi = eth0,shared,ip
            # @ restart = no
            # @ class = small
            # @ total_tasks = 60
            # @ error = /people/username/mpi_test/mpi_batch.$(jobid).err
            # @ output = /people/username/mpi_test/mpi_batch.$(jobid).out
            # @ wall_clock_limit = 00:05:00
            # @ queue
            
            /usr/bin/poe /people/username/mpi_test/transient-diffusion -n 16 -dt 0.001

    * This worked for 10 tasks but failed with 11 tasks with the following
			 error::

            ATTENTION: 0031-408  11 tasks allocated by LoadLeveler, continuing...
            ERROR: 0031-769 Invalid task environment data received.
            ERROR: 0031-024  bob.bob.nz: no response; rc = -1

    We were at the time running::

            ppe.poe                    4.2.0.0  APPLIED    poe Parallel Operating

	 A clue in some release notes led us to try an upgrade::

            ppe.poe                    4.2.2.6  APPLIED    poe Parallel Operating

    and now I can run 60 tasks successfully.  It is possible that
			 LoadLeveller and the POE just got out of sync as we have recently
			 upgraded LoadLeveller.