Personal tools
You are here: Home openCMISS Wiki Configuring and Making Libmesh on Various Systems
Views

Configuring and Making Libmesh on Various Systems

last edited 4 years ago by blackett

General Notes:

  • libMesh does not need to be built with PETSc except you will not be able to run all examples (e.g., example 4 will fail)
  • libMesh easily configures on standard linux distributions since standard Ubuntu linux has PETSc preinstalled.
  • libMesh will not build using shared libraries on hpc and bioeng22 right out of the box.
    • Note that contrib/tetgen/Makefile overrides CXXFLAGS and does not include -fPIC so the resulting tetgen objects cannot be included in the shared library on bioeng22 (amd64 systems).
    • Any of the following workarounds are available
      • edit contrib/tetgen/Makefile
      • configure with --disable-tetgen
      • configure with --disable-shared
    • Will sometimes only configure using --enable-shared=no for hpc.

      CXX=xlC_r CC=xlC_r ./configure --disable-mpi configures and make CXX=xlC_r CXXFLAGS="-DNDEBUG -O2 -w -qansialias" CFLAGS="-DNDEBUG -O2 -w -qansialias" builds but the resulting executable aborts at line 210 in include/base/variant_filter_iterator.h.

  • Configure a debugged version by setting METHOD=dbg (Look in Make.common.in for options.)

bioeng22 Configuration:

  • You must set P4_RSHCOMMAND=ssh using export (bash) or setenv (tcsh). This allows us to run mpi jobs on bioeng22 using ssh instead of rsh (which has no running server) as the connector. (The -rsh rshcmd option for mpirun doesn't have any effect with mpich-bin 1.2.5.3-5 on Ubuntu amd64.)
  • ./configure --enable-shared=no --enable-mpi --enable-petsc --with-petsc=$PETSC_DIR --with-mpi=$MPI_DIR

    Note: Using the above configuration a run using PETSc fails because it cannot recognize any MPI calls. (WHY?)

  • ./configure --enable-shared=no --enable-mpi --with-mpi=$MPI_DIR

    Note: If I have set $PETSC_DIR and do not explicitly say --enable-petsc=no then it will assume that I want to use PETSc and then try to use PETSc's mpi libraries.

  • ./configure --enable-shared=no --enable-mpi --with-mpi=$MPI_DIR --enable-petsc=no

    Note: This one WORKS. I can run some simple examples and they do not fail when searching for MPI libraries. I can also run ex9 with the above configuration (and then make) but not with (1) or (2)

  • ./configure --enable-tetgen=no (RECOMMENDED CONFIGURE FOR SERIAL AND PARALLEL)

    Note: This was suggested by Karl and requires setting the correct PETSc directories and PETSc arch. He suggested setenv PETSC_DIR /usr/lib/petsc and setenv PETSC_ARCH linux.

    Note: To run any examples remember to use mpirun -np 4(1,2,3) ex9 for 4(1,2,3) processors.

  • If you try to make METHOD=dbg you will get a failure due to long long in mpio.h not being supported. The problem is probably in mpich, but the error can be suppressed by adding the -Wno-long-long flag to CXXFLAGS in Make.common or configure. Adding this flag wherever -pedantic appears ensures that make will work. (Discussed on liMmesh wiki)

    When using Ubuntu's libpetsc2.2.0-dbg 2.2.0-4, the build fails when linking the examples due to unresolved symbols MPE_Log_event and others in /usr/lib/petsc/lib/libg/linux/libpetscsnes.a and /usr/lib/petsc/lib/libg/linux/libpetsc.a.

    With METHOD=opt (default), shared petsc libraries are used and these manage their own dependencies, but when using static libraries petsc's unusual build system provides the dependency information in /usr/lib/petscdir/2.2.0/bmake/linux/packages for makefiles. libmesh's Make.common.in should probably be modified to use MPE_LIB (and maybe MPE_INCLUDE and PETSC_HAVE_MPE) from petsc, but the quickest hack to get things working is to provide the following argument to make when using METHOD=dbg:

           MPI_LIB='-L/usr/lib/mpich/lib/shared -L/usr/lib/mpich/lib -lmpich -lmpe -lpmpich -lslog'
    

    Note that this MPI_LIB definition must also be used when making any examples.

  • To compile with a version of PETSc (2.3.2-p7) that has HYPRE included in the libraries you must configure libMesh in the following way:
    • ./configure --disable-tetgen --disable-shared --enable-petsc --with-petsc=$PETSC_DIR where $PETSC_DIR has been set to be your new PETSc version.
    • See this page for how to compile the PETSc/HYPRE combination properly.

hpc ppc-aix Configuration:

  • PETSc petsc-2.3.2-p6 built as in Configuring and Making PETSc on Various Systems
  • Set your PETSc environement variables i.e. PETSC_ARCH=aix5.1.0.0-64 and PETSC_DIR=/hpc/cmiss/petsc/petsc-2.3.2-p6
    • configure libMesh, for 32 bit ./configure --disable-shared CXX=xlC_r CC=xlC_r LDFLAGS=-Wl,-bbigtoc --enable-petsc --enable-mpi, for 64 bit ./configure --disable-shared CXX=xlC_r CXXFLAGS=-q64 CC=xlC_r CFLAGS=-q64 F77=xlf_r F77FLAGS=-q64 --enable-petsc --enable-mpi

    -Wl,-bbigtoc isn't required for building the library but for making executables later. The documentation says that it incurs a prohibitive runtime penalty so instead we should be trying to hide symbols.

  • error in detecting xlc, the existing test expects executing xlC to produce a manpage which successfully greps for xlC, but on our system the included string is xlc, so I changed the grep to 'xl[cC]?'
  • Missing an include file. It seems to me that ppc-32-aix/include/mesh/mesh_refinement.h should #include
  • When building with MPI there are some macros that need to be defined to get the IBM mpi.h to define SEEK_CUR etc. I added these defines to ppc-32-aix/include/base/libmesh_common.h There are symbols to define which are intended to create the SEEK_* symbols from the MPI_SEEK_* symbols but they use enums which do not seem ot work with the templates that required them:
                         // On AIX with the parallel environment we need this macro to get SEEK_CUR
                         // defined which is used by using fstream/iostream.
                         #define _MPI_CPP_BINDINGS
                         #define _MPI_CPP_ALL_CONSTANTS
    
          Instead I had to just define them, I need to make this depend on the
                      compiler::
    
                         // Include the MPI definition
                         #ifdef HAVE_MPI
                         #undef SEEK_SET
                         #undef SEEK_CUR
                         #undef SEEK_END
                         // On AIX with the parallel environment we need this macro to get SEEK_CUR
                         // defined which is used by using fstream/iostream.
                         #define _MPI_CPP_BINDINGS
                         #define _MPI_CPP_ALL_CONSTANTS
                         # include <mpi.h>
                         #define SEEK_SET MPI_SEEK_SET
                         #define SEEK_CUR MPI_SEEK_CUR
                         #define SEEK_END MPI_SEEK_END
                         #endif
    
  • The tetgen library defines REAL which conflicts with the AIX MPI, so I undefined it, and defined another symbol assuming that REAL was supposed to be double. I then edited src/mesh/mesh_tetgen_support.C as follows. I could have just diabled the library I guess:
                         #include "mesh_tetgen_support.h"
                         #ifdef REAL
                         #  define TETGEN_REAL double
                         #  undef REAL
                         #endif
    
                         //#include "mesh_data.h"
                         #include "cell_tet4.h"
                         #include "face_tri3.h"
                         #include "mesh.h"
                         #ifdef TETGEN_REAL
                         #  define REAL TETGEN_REAL
                         #endif
    
          This enabled me to get it built.  Then I need to run it.
    
             * If I set MP_PROCS to the number of procs I want,
                             enable rsh with the insecure ~/.rhosts mechanism and
                             repeat our compute system enough times in a host.list file
                             then I can run multiprocessor, I have tried up to 16 processors.
                             However I don't want to encourage the use of rsh and we need to use
                             loadleveller to share the resources with other users.
    
  • We do not currently have any Loadleveller Pools defined so we can't use that mechanism.
  • Instead to use LoadLeveller I created a script file like other LoadLeveller instructions:
                #!/bin/ksh
                # @ job_type = parallel
                ## @ environment = COPY_ALL; \
                MP_EUILIB=ip; \
                MP_INFOLEVEL=2;
                ## @ network.mpi = eth0,shared,ip
                # @ restart = no
                # @ class = small
                # @ total_tasks = 60
                # @ error = /people/username/mpi_test/mpi_batch.$(jobid).err
                # @ output = /people/username/mpi_test/mpi_batch.$(jobid).out
                # @ wall_clock_limit = 00:05:00
                # @ queue
    
                /usr/bin/poe /people/username/mpi_test/transient-diffusion -n 16 -dt 0.001
    
  • This worked for 10 tasks but failed with 11 tasks with the following error:
                ATTENTION: 0031-408  11 tasks allocated by LoadLeveler, continuing...
                ERROR: 0031-769 Invalid task environment data received.
                ERROR: 0031-024  bob.bob.nz: no response; rc = -1
    

We were at the time running:

            ppe.poe                    4.2.0.0  APPLIED    poe Parallel Operating

         A clue in some release notes led us to try an upgrade::

            ppe.poe                    4.2.2.6  APPLIED    poe Parallel Operating

and now I can run 60 tasks successfully. It is possible that LoadLeveller and the POE just got out of sync as we have recently upgraded LoadLeveller.