Using Sundance
I have spent a couple of days getting Sundance to run our test problem.
Notes:
- Nice connection between the weak integral form of equations and the code
that I have to write.
- Large exectuables (70Mb) generated for small problems, presumably
primarily all the libraries and suchlike.
Got some results after only a couple of days working on it.
Procedure:
To build on bioeng22
- Trilinos 6.0.16 I used the sundance-linux-mpi-opt but removed -lumfpack
and -lamd and the -enable-amesos-umfpack because earlier build scripts were failing to find these libraries even though they are in the system.
Sundance 2.1.2 I used linux-mpi-opt
Once I could run test examples I then tried 'mpirun -np 2' and in the environment 'setenv P4_RSHCOMMAND ssh'
Initially I got an error that Karl got with libmesh on the Stokes2D.exe problem:
p0_6750: p4_error: Child process exited while making connection to remote process on localhost: 0
- but that seems because also dependent on the problem and then in a new shell
- had gone away completely.
- Anyway I moved on. There is no explicity time stepping objects in Sundance
- but following an old Sundance 1.0 example I implemented Crank Nicholson and fully implicit symbolically.
<a href="OCtimeStepHeat2D.cpp">OCtimeStepHeat2D.cpp</a>
Results:
- On bioeng22 which has 4 CPUS = 2 x Dual Core AMD Opteron(tm) Processor 275,
- 2210.198MHz, 1024 KB Cache and 8154836 kB RAM
For one processor 10 steps with a 200x200 grid (about time 2:33.48 memory VIRT 408M RES 333M):
eqn = delU[0]*(U[0]-soln[0])+0.02*(0.5*(D[delU[0], x]*D[U[0]+soln[0], x])+0.5*(D[delU[0], y]*D[U[0]+soln[0], y])) u0 = {soln[0]} [9] error norm = 0.000493527 [1] eval vector flops: 2.928e+07 [1] quadrature flops: 3.39361e+08 [1] ref integration flops: 4.93449e+08 [1] cell jacobian batch flops: 1.94498e+09 [1] quadrature eval mediator: 1.93632e+09 get mesh : 0.621094 unbatched facet grabbing : 3.50781 batched facet grabbing : 0.0195312 Expr symbolic ops : 0 symbolic preprocessing : 0.00390625 assembler ctor : 6.04688 DOF map building : 6.02734 assembly : 9.26562 matrix config : 1.54297 matrix graph determination : 1.30469 graph column processing : 0.765625 batched dof lookup : 1.93359 tmp graph flattening : 0.246094 matrix allocation : 0.0429688 Low-level vector operations : 15.1914 cell Jacobian grabbing : 0.0976562 Symbolic Evaluation : 2.01953 coord function evaluation : 0.0703125 integration : 2.95312 ref integration : 1.90625 jacobian factoring : 0.207031 matrix insertion : 1.85547 quadrature : 0.753906 vector insertion : 0.347656 linear solve : 108.938 discrete function evaluation : 1.46484 building integral transformation matrices: 0.8125 jacobian inversion : 0.425781 Expr output : 0 viz output : 8.76172 unbatched dof lookup : 4.64844
For two processors (about time 2:02.55 memory VIRT 250&220M RES 178M):
mpirun -np 2 ./OCtimeStepHeat2D.exe error norm = 0.000498438 error norm = 0.000498438 [1] eval vector flops: 1.47864e+07 [1] quadrature flops: 1.71378e+08 [1] ref integration flops: 2.49196e+08 [1] cell jacobian batch flops: 9.82217e+08 [1] quadrature eval mediator: 9.77842e+08
For three processors (about time 1:27.45 memory VIRT 190&160&160M RES 120&120&120M):
mpirun -np 3 ./OCtimeStepHeat2D.exe error norm = 0.000499121 error norm = error norm = 0.000499121 0.000499121 [1] eval vector flops: 9.8088e+06 [1] quadrature flops: 1.13686e+08 [1] ref integration flops: 1.65318e+08 [1] cell jacobian batch flops: 6.51572e+08 [1] quadrature eval mediator: 6.48667e+08
- For one processor 1 steps with a 1000x1000 grid (about time 29:13.51 memory
VIRT 8113M 7.5G):
error norm = 0.00176444
1 eval vector flops: 1.92e+08 1 quadrature flops: 1.014e+09 1 ref integration flops: 1.30005e+09 1 cell jacobian batch flops: 5.10001e+09 1 quadrature eval mediator: 5.028e+09 get mesh : 14.2266 unbatched facet grabbing : 86.4805 batched facet grabbing : 0.441406 Expr symbolic ops : 0.0429688 symbolic preprocessing : 0.0625 assembler ctor : 145.75 DOF map building : 145.191 assembly : 86.5898 matrix config : 51.6289 matrix graph determination : 45.875 graph column processing : 28.6523 batched dof lookup : 5.07422 tmp graph flattening : 7.93359 matrix allocation : 0.96875 Low-level vector operations : 204.188 cell Jacobian grabbing : 0.632812 Symbolic Evaluation : 11.7148 coord function evaluation : 2.1875 integration : 9.19531 ref integration : 5.33594 jacobian factoring : 1.25781 matrix insertion : 9.19141 quadrature : 2.60547 vector insertion : 1.875 linear solve : 1445.57 discrete function evaluation : 5.41797 building integral transformation matrices: 2.24609 jacobian inversion : 1.14844 Expr output : 0.046875 viz output : 21.9258 unbatched dof lookup : 11.5977
- For two processors 50 steps * 0.001s with a 1000x1000 grid and tolerance at 1-02 so
- 60-70 iterations (about time 1:08:16.29 memory VIRT 3337m&3311m RES 3.2g&3.2g ):