Coupled Electro Mechanics
PAGE IN DEVELOPMENT
Activation:
Solve method / convergance when using complex (>20 ODE) cell models:
Results in test 2x2x2 mm cube using default conductions and Niederer and Smith cell model coded in cellml. Solution is calculated using a fully implicit solve using biconj solve and lsoda integrator
For activation of 150 uA/mm2 in 0.1 radias around corner node Wave speed is calcualted as cube diagonal less 0.1 mm, over the time taken for the activation wave to reach the far corner node.
Colocation: spacing (mm) 0.28 0.13 0.09 Wave speed: (mm/ms) 0.12 0.17 0.18 median solve time: (s) 1.4 17 60
FEM: spacing (mm): 0.28 0.181 0.13 Wave speed: (mm/ms) 0.17 0.19 0.19 median solve time: (s) 3.2 10 24
In all future simulations a FEM based activation scheme is used with approx 0.2 mm grid spacing.
Restart method:
HPCx has maximum 12 hour run iniital coupled electromechancis model took 3 weeks to solve. So restarting the simulation is very important.
fem write matrix;init/Cube_yq_9ms matrices yq binary fem write matrix;init/Cube_yqs_9ms matrices yqs binary
export the electrophysiology solve data to binary files that can be read in following the re set up of the problem by:
fem read matrix;init/Cube_yq_9ms matrices yq binary fem read matrix;init/Cube_yqs_9ms matrices yqs binary
You need to create a restart ipmatc file that has no stimulus current. that is read in when the solution is restarted.
NOTE: do not export the yqs and yq vectors while a stimulus current is being applied unless this is explicitly accounted for. Otherwise the restart solution will not be the same as if the solve was not restarted.
Parrallel problems:
When running in parrallel you may get a whole file of zeros. From my experiance this says the cell model is dying some where.
If using cellml make sure you save all of the derived variables. Not doing this causes errors somewhere in MARCH8.f Where exactly I am not sure. This may be due to the allocation of memory for the FCN array in RHS but it should not be. This may be a compiler issue?
When I write out the variables the problem dissapears?: Store the variables in a temporary array and write them out outside of the problem parallel loop.
NOTE: When comparing parrallel code to 1 and 2 thread you may notice that OMP runs for loops randomly so for 1 = 1..5 will result in i = 3, i = 5 i = 1 .. or what ever, but more than likly not i = 1,2,3,4,5. Even if you are running one thread in mt mode it will be random
To get round this, so you can compare 1 and 2 thread versions, when writing out variables to debug parrallel code write out the time initially, followed by the index number then what ever variable you want.
run cmiss in one and two thread mode and store the out put in a file:
for 1 thread (tcsh)
setenv OMP_NUM_THREADS 1 mycmiss-mt $myfile.com > output.file_1_thread
then for 2 threads
setenv OMP_NUM_THREADS 2 mycmiss-mt $myfile.com > output.file_2_thread
This will store the cmiss output in output.file
then run sort on the two out puts (-n is by number)
sort -n output.file_1_thread > sorted_1 sort -n output.file_2_thread > sorted_2
then compare the two
diff sorted_1 sorted_2
this will compare the variables at the correct time and grid point for 1 and two thread versions 2 At least for 1 iteration you want these to out puts to be exactly the same to about 12 sf.
Errors:
Sometimes HPCx will give you a
Trace/BPT trap (core dumped)
you can get more info on this error from
http://www.hpcx.ac.uk/support/FAQ/crash/
Memory problems:
The memory allocations differ on different machines.
on HPC (auckland) interactive mode ulimit -a
time(seconds) unlimited file(blocks) 18388604 data(kbytes) 327680000 stack(kbytes) 4194304 memory(kbytes) 327680000 coredump(blocks) 0 nofiles(descriptors) 400000
ulimit -aH
time(seconds) unlimited file(blocks) 18388604 data(kbytes) unlimited stack(kbytes) 4194304 memory(kbytes) unlimited coredump(blocks) unlimited nofiles(descriptors) unlimited
on llsubmit
ulimit -a
time(seconds) unlimited file(blocks) 18388604 data(kbytes) unlimited stack(kbytes) 4194304 memory(kbytes) unlimited coredump(blocks) 2097151 nofiles(descriptors) unlimited
ulimit -aH
time(seconds) unlimited file(blocks) 18388604 data(kbytes) unlimited stack(kbytes) 4194304 memory(kbytes) unlimited coredump(blocks) unlimited nofiles(descriptors) unlimited
On HPCx (oxford) interactive mode
ulimit -a
time(seconds) unlimited file(blocks) unlimited data(kbytes) 819200 stack(kbytes) 409600 memory(kbytes) 409600 coredump(blocks) 204800 nofiles(descriptors) 2000
ulimit -aH
time(seconds) unlimited file(blocks) unlimited data(kbytes) unlimited stack(kbytes) 4194304 memory(kbytes) unlimited coredump(blocks) unlimited nofiles(descriptors) unlimited
on llsubmit
ulimit -a
time(seconds) unlimited file(blocks) unlimited data(kbytes) 28249129 stack(kbytes) 20480 memory(kbytes) 28269609 coredump(blocks) 67108864 nofiles(descriptors) 2000
ulimit -aH
time(seconds) unlimited file(blocks) unlimited data(kbytes) 28249129 stack(kbytes) 20480 memory(kbytes) 28269609 coredump(blocks) 67108864 nofiles(descriptors) unlimited
HPCx does not allow for virtual memory. ie virtual memory may not be larger than the resident meory so swaping is not allowed.
lsps may provide infor on the virtual meory.
fem def para;w;filename minimal outputs a ippara file that has a guess at the minimal memory requriements for a problem
to do: Check Tiong iff 327 GB of mem on auckland (sounds like alot) ulimit seems to be capable of 64 bit mem.
example b211v print_memory_use
CMISS memory errors:
Breaching a stack size limit would result in a SIGSEGV.
To Fix: hpcx web http://www.hpcx.ac.uk/support/FAQ/stack.txt
A data limit would result in a failure to allocate memory, which would be reported by cm.
May not be able to fix?
I don't know how a resident set size (memory?) limit would present - perhaps a SIGKILL.
loadleveler will not allow you to overstep resident memory on HPCx resident = data +stack if data +stack > resident loadleveler => err will not put in que.
Mechanics:
Solving Ellipse in cm64-mt opt gives different answers for 1 and 2 threads for ellips solving ellipse for cm64-mt-debug has error in ZPRP_DYNAM.f line:
132 YP(ny,4)=YP(ny,4)+RE(ns,nh)
TODO: find out why.
Coupled: