Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to geoclaw bouss calling of PETSc to support shared memory #632

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BarrySmith
Copy link

Shared memory is used to transfer data from OpenMP code to MPI processes instead of nonscalable message passing

Shared memory is used to transfer data from OpenMP code to MPI processes instead of nonscalable message passing
@BarrySmith
Copy link
Author

Note this new code requires petsc 3.21 or later (3.21 was released in march).

We should talk about adding the version test to the geoclaw makefiles later.

@rjleveque
Copy link
Member

Thanks @BarrySmith!

@mjberger tells me we still need to do some more timing comparisons between this and the original version, which I will try to work with her on in the near future.

@rjleveque
Copy link
Member

I tried running this along with my new petscMPIoptions file from #631. It still runs fine with

    -mpi_linear_solver_server_use_shared_memory false

but with

   -mpi_linear_solver_server_use_shared_memory true

I get the following...

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: shmat() of shmid 65544 returned 0xffffffffffffffff Too many open files
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-ksp_max_it value: 200 source: file
[0]PETSC ERROR:   Option left: name:-ksp_reuse_preconditioner (no value) source: file
[0]PETSC ERROR:   Option left: name:-ksp_rtol value: 1.e-9 source: file
[0]PETSC ERROR:   Option left: name:-ksp_type value: gmres source: file
[0]PETSC ERROR:   Option left: name:-mpi_linear_solver_server_minimum_count_per_rank value: 5000 source: file
[0]PETSC ERROR:   Option left: name:-mpi_linear_solver_server_view (no value) source: file
[0]PETSC ERROR:   Option left: name:-pc_type value: gamg source: file
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.22.1, unknown 

[0]PETSC ERROR: /Users/rjl/git/clawpack/geoclaw/examples/bouss/radial_flat/xgeoclaw with 6 MPI process(es) and PETSC_ARCH arch-darwin-c-opt on niwot by rjl Sat Nov  9 09:07:05 2024
[0]PETSC ERROR: Configure options: --with-debugging=0
[0]PETSC ERROR: #1 PetscShmgetAllocateArray() at /Users/rjl/git/Clones/petsc/src/sys/utils/server.c:279
[0]PETSC ERROR: #2 VecCreate_Seq() at /Users/rjl/git/Clones/petsc/src/vec/vec/impls/seq/bvec3.c:33
[0]PETSC ERROR: #3 VecSetType() at /Users/rjl/git/Clones/petsc/src/vec/vec/interface/vecreg.c:161
[0]PETSC ERROR: #4 VecCreateSeq() at /Users/rjl/git/Clones/petsc/src/vec/vec/impls/seq/vseqcr.c:34
[0]PETSC ERROR: #5 /Users/rjl/git/clawpack/geoclaw/src/2d/bouss/implicit_update_bouss_2Calls.f90:47
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
with errorcode 76.

@rjleveque
Copy link
Member

I just tried this on a different problem, and again it runs ok with ..use_shared_memory false but when true I now get a different error, see below. This problem is set to use SWE for the first 60 seconds and then switch to the Boussinesq equations. So PETSc is not used at earlier times and it dies at t=60:

[...]
 AMRCLAW: level  1  CFL = .406E+00  dt = 0.1000E+02  final t = 0.600000E+02
Writing fgout grid #   1  frame   12 at time =    0.55000000E+02
Writing fgout grid #   2  frame   12 at time =    0.55000000E+02
Writing fgout grid #   1  frame   13 at time =    0.60000000E+02
Writing fgout grid #   2  frame   13 at time =    0.60000000E+02
 Using/Switching to Boussinesq equations
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Petsc has generated inconsistent data
[0]PETSC ERROR: Unable to locate PCMPI allocated shared address 0x141208000
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-ksp_max_it value: 200 source: file
[0]PETSC ERROR:   Option left: name:-ksp_reuse_preconditioner (no value) source: file
[0]PETSC ERROR:   Option left: name:-ksp_rtol value: 1.e-9 source: file
[0]PETSC ERROR:   Option left: name:-ksp_type value: gmres source: file
[0]PETSC ERROR:   Option left: name:-mpi_linear_solver_server_view (no value) source: file
[0]PETSC ERROR:   Option left: name:-pc_type value: gamg source: file
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.22.1, unknown 
[0]PETSC ERROR: /Users/rjl/git/CopesHubTsunamis/geoclaw_runs/xgeoclaw_bouss_v511_sharedmem with 6 MPI process(es) and PETSC_ARCH arch-darwin-c-opt on niwot.local by rjl Sat Nov 16 06:31:46 2024
[0]PETSC ERROR: Configure options: --with-debugging=0
[0]PETSC ERROR: #1 PetscShmgetMapAddresses() at /Users/rjl/git/Clones/petsc/src/sys/utils/server.c:114
[0]PETSC ERROR: #2 PCMPISetMat() at /Users/rjl/git/Clones/petsc/src/ksp/pc/impls/mpi/pcmpi.c:269
[0]PETSC ERROR: #3 PCSetUp_MPI() at /Users/rjl/git/Clones/petsc/src/ksp/pc/impls/mpi/pcmpi.c:853
[0]PETSC ERROR: #4 PCSetUp() at /Users/rjl/git/Clones/petsc/src/ksp/pc/interface/precon.c:1071
[0]PETSC ERROR: #5 KSPSetUp() at /Users/rjl/git/Clones/petsc/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #6 KSPSolve_Private() at /Users/rjl/git/Clones/petsc/src/ksp/ksp/interface/itfunc.c:826
[0]PETSC ERROR: #7 KSPSolve() at /Users/rjl/git/Clones/petsc/src/ksp/ksp/interface/itfunc.c:1075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants