Configuring and Making Libmesh on Various Systems
General Notes:
'libMesh' does not need to be built with 'PETSc' except you will not be able to run all examples (e.g., example 4 will fail)
'libMesh' easily configures on standard linux distributions since standard Ubuntu linux has PETSc preinstalled.
'libMesh' will not build using shared libraries on hpc and bioeng22 right out of the box.
Note that 'contrib/tetgen/Makefile' overrides CXXFLAGS and does not include -fPIC so the resulting tetgen objects cannot be included in the shared library on bioeng22 (amd64 systems).
Any of the following workarounds are available
- edit 'contrib/tetgen/Makefile'
- configure with --disable-tetgen
- configure with --disable-shared
Will sometimes only configure using --enable-shared=no for hpc.
'CXX=xlC_r CC=xlC_r ./configure --disable-mpi' configures and 'make CXX=xlC_r CXXFLAGS="-DNDEBUG -O2 -w -qansialias" CFLAGS="-DNDEBUG -O2 -w -qansialias"' builds but the resulting executable aborts at line 210 in include/base/variant_filter_iterator.h.
Configure a debugged version by setting METHOD=dbg (Look in Make.common.in for options.)
bioeng22 Configuration:
- You must set P4_RSHCOMMAND=ssh using export (bash) or setenv (tcsh).
This allows us to run mpi jobs on bioeng22 using ssh instead of rsh (which has no running server) as the connector. (The '-rsh rshcmd' option for 'mpirun' doesn't have any effect with mpich-bin 1.2.5.3-5 on Ubuntu amd64.)
./configure --enable-shared=no --enable-mpi --enable-petsc --with-petsc=$PETSC_DIR --with-mpi=$MPI_DIR
Note: Using the above configuration a run using PETSc fails because it cannot recognize any MPI calls. (WHY?)
./configure --enable-shared=no --enable-mpi --with-mpi=$MPI_DIR
Note: If I have set $PETSC_DIR and do not explicitly say --enable-petsc=no then it will assume that I want to use PETSc and then try to use PETSc's mpi libraries.
./configure --enable-shared=no --enable-mpi --with-mpi=$MPI_DIR --enable-petsc=no
Note: This one WORKS. I can run some simple examples and they do not fail when searching for MPI libraries. I can also run ex9 with the above configuration (and then make) but not with (1) or (2)
./configure --enable-tetgen=no (RECOMMENDED CONFIGURE FOR SERIAL AND PARALLEL)
Note: This was suggested by Karl and requires setting the correct PETSc directories and PETSc arch. He suggested 'setenv PETSC_DIR /usr/lib/petsc' and 'setenv PETSC_ARCH linux'.
Note: To run any examples remember to use 'mpirun -np 4(1,2,3) ex9' for 4(1,2,3) processors.
If you try to 'make METHOD=dbg' you will get a failure due to 'long long' in 'mpio.h' not being supported. The problem is probably in mpich, but the error can be suppressed by adding the '-Wno-long-long' flag to CXXFLAGS in 'Make.common' or 'configure'. Adding this flag wherever '-pedantic' appears ensures that make will work. (Discussed on "liMmesh wiki":http://libmesh.sourceforge.net/wiki/index.php/Installation#.22long_long.22_Compilation_errors_with_MPICH)
When using Ubuntu's libpetsc2.2.0-dbg 2.2.0-4, the build fails when linking the examples due to unresolved symbols 'MPE_Log_event' and others in '/usr/lib/petsc/lib/libg/linux/libpetscsnes.a' and '/usr/lib/petsc/lib/libg/linux/libpetsc.a'.
With METHOD=opt (default), shared petsc libraries are used and these manage their own dependencies, but when using static libraries petsc's unusual build system provides the dependency information in '/usr/lib/petscdir/2.2.0/bmake/linux/packages' for makefiles. libmesh's 'Make.common.in' should probably be modified to use MPE_LIB (and maybe MPE_INCLUDE and PETSC_HAVE_MPE) from petsc, but the quickest hack to get things working is to provide the following argument to make when using METHOD=dbg:
MPI_LIB='-L/usr/lib/mpich/lib/shared -L/usr/lib/mpich/lib -lmpich -lmpe -lpmpich -lslog'Note that this MPI_LIB definition must also be used when making any examples.
To compile with a version of PETSc (2.3.2-p7) that has HYPRE included in the libraries you must configure libMesh in the following way:
- './configure --disable-tetgen --disable-shared --enable-petsc --with-petsc=$PETSC_DIR' where $PETSC_DIR has been set to be your new PETSc version.
- See "this page":http://www.cmiss.org/openCMISS/wiki/ConfiguringAndMakingPETScOnVariousSystems for how to compile the PETSc/HYPRE combination properly.
hpc ppc-aix Configuration:
PETSc petsc-2.3.2-p6 built as in Configuring and Making PETSc on Various Systems
Set your PETSc environement variables i.e. 'PETSC_ARCH=aix5.1.0.0-64' and 'PETSC_DIR=/hpc/cmiss/petsc/petsc-2.3.2-p6'
- configure libMesh, for 32 bit './configure --disable-shared CXX=xlC_r CC=xlC_r LDFLAGS=-Wl,-bbigtoc --enable-petsc --enable-mpi', for 64 bit './configure --disable-shared CXX=xlC_r CXXFLAGS=-q64 CC=xlC_r CFLAGS=-q64 F77=xlf_r F77FLAGS=-q64 --enable-petsc --enable-mpi'
- -Wl,-bbigtoc isn't required for building the library but for making
executables later. The documentation says that it incurs a prohibitive runtime penalty so instead we should be trying to hide symbols.
error in detecting xlc, the existing test expects executing xlC to produce a manpage which successfully greps for xlC, but on our system the included string is xlc, so I changed the grep to 'xlcC'
- Missing an include file.
It seems to me that ppc-32-aix/include/mesh/mesh_refinement.h should #include <limits>
- When building with MPI there are some macros that need to be defined
to get the IBM mpi.h to define SEEK_CUR etc.
- I added these defines to 'ppc-32-aix/include/base/libmesh_common.h'
There are symbols to define which are intended to create the 'SEEK_*' symbols from the 'MPI_SEEK_*' symbols but they use enums which do not seem ot work with the templates that required them:
// On AIX with the parallel environment we need this macro to get SEEK_CUR // defined which is used by using fstream/iostream. #define _MPI_CPP_BINDINGS #define _MPI_CPP_ALL_CONSTANTS- Instead I had to just define them, I need to make this depend on the
compiler:
// Include the MPI definition #ifdef HAVE_MPI #undef SEEK_SET #undef SEEK_CUR #undef SEEK_END // On AIX with the parallel environment we need this macro to get SEEK_CUR // defined which is used by using fstream/iostream. #define _MPI_CPP_BINDINGS #define _MPI_CPP_ALL_CONSTANTS # include <mpi.h> #define SEEK_SET MPI_SEEK_SET #define SEEK_CUR MPI_SEEK_CUR #define SEEK_END MPI_SEEK_END #endif
- The tetgen library defines REAL which conflicts with the AIX MPI, so
I undefined it, and defined another symbol assuming that REAL was supposed to be double. I then edited 'src/mesh/mesh_tetgen_support.C' as follows. I could have just diabled the library I guess:
#include "mesh_tetgen_support.h" #ifdef REAL # define TETGEN_REAL double # undef REAL #endif
//#include "mesh_data.h" #include "cell_tet4.h" #include "face_tri3.h" #include "mesh.h" #ifdef TETGEN_REAL # define REAL TETGEN_REAL #endif
This enabled me to get it built. Then I need to run it.
- If I set MP_PROCS to the number of procs I want,
enable rsh with the insecure ~/.rhosts mechanism and repeat our compute system enough times in a host.list file then I can run multiprocessor, I have tried up to 16 processors. However I don't want to encourage the use of rsh and we need to use loadleveller to share the resources with other users.
- We do not currently have any Loadleveller Pools defined so we can't use
that mechanism.
- Instead to use LoadLeveller I created a script file like other
LoadLeveller instructions:
#!/bin/ksh # @ job_type = parallel ## @ environment = COPY_ALL; MP_EUILIB=ip; MP_INFOLEVEL=2; ## @ network.mpi = eth0,shared,ip # @ restart = no # @ class = small # @ total_tasks = 60 # @ error = /people/username/mpi_test/mpi_batch.$(jobid).err # @ output = /people/username/mpi_test/mpi_batch.$(jobid).out # @ wall_clock_limit = 00:05:00 # @ queue
/usr/bin/poe /people/username/mpi_test/transient-diffusion -n 16 -dt 0.001
- This worked for 10 tasks but failed with 11 tasks with the following
error:
ATTENTION: 0031-408 11 tasks allocated by LoadLeveler, continuing... ERROR: 0031-769 Invalid task environment data received. ERROR: 0031-024 bob.bob.nz: no response; rc = -1
We were at the time running:
ppe.poe 4.2.0.0 APPLIED poe Parallel Operating A clue in some release notes led us to try an upgrade:: ppe.poe 4.2.2.6 APPLIED poe Parallel Operating
- and now I can run 60 tasks successfully. It is possible that
- LoadLeveller and the POE just got out of sync as we have recently upgraded LoadLeveller.