Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when using more than 1 GPU in STRUMPACK MPI #126

Open
jinghu4 opened this issue Nov 28, 2024 · 5 comments
Open

Failure when using more than 1 GPU in STRUMPACK MPI #126

jinghu4 opened this issue Nov 28, 2024 · 5 comments

Comments

@jinghu4
Copy link

jinghu4 commented Nov 28, 2024

Hi, Dr. Ghysels,

I have seen some issues when using multi-GPU feature of STRUMPACK to solve a sparse matrix. I built STRUMPACK successfully with support of SLATE and MAGMA.

  1. When I run the test cases in STRUMPACK, "make test", the sparse_mpi and reuse_structure_mpi both failed.
# multifrontal factorization:
#   - estimated memory usage (exact solver) = 0.178864 MB
#   - minimum pivot, sqrt(eps)*|A|_1 = 1.05367e-08
#   - replacing of small pivots is not enabled
CUDA assertion failed: invalid resource handle ~/STRUMPACK-v8.0.0/STRUMPACK-8.0.0/src/dense/CUDAWrapper.cu 114
[gpu01:2817703] *** Process received signal ***

However, it passes when I run with one GPU: "
OMP_NUM_THREADS=1 mpirun -n 1 test_structure_reuse_mpi pde900.mtx

  1. Random failure when solving a sparse matrix with STRUMPACK multi-gpu
    Example: I try using 2 GPUs:
mpirun -n 2 --mca pml ucx myApplication.exe

a) sometimes it passes

OMP: Info #277: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
# DenseMPI factorization complete, GPU=1, P=2, T=10: 0.170223 seconds, 0.00550864 GFLOPS, 0.0323613 GFLOP/s,  ds=203, du=0 

(Why GPU =1 here? Does it mean, it only use one GPU but two processes are run on each og gpus I request? )

b) sometimes it fails with error msg

# multifrontal factorization:
#   - estimated memory usage (exact solver) = 23.5596 MB
#   - minimum pivot, sqrt(eps)*|A|_1 = 1.05367e-08
#   - replacing of small pivots is not enabled
cuSOLVER assertion failed: 6 ~/STRUMPACK-v8.0.0/STRUMPACK-8.0.0/src/dense/CUDAWrapper.cpp 614
CUSOLVER_STATUS_EXECUTION_FAILED

Do you know what the reasons could be, causing these issues and how should I resolve them?

Best,
-Jing

@pghysels
Copy link
Owner

pghysels commented Nov 28, 2024

The GPU =1 means that GPU is enabled, otherwise it would be GPU =0.
Sorry that is confusing, I will fix that.

The OMP deprecation message is probably coming from the SLATE library.

I believe the invalid resource handle message is because multiple mpi processes are using the same GPU, and so it is using more CUDA streams than allowed per GPU.

@pghysels
Copy link
Owner

This changes the GPU =1 to GPU enabled:
115b152

@pghysels
Copy link
Owner

When you run with P mpi ranks on a machine with D GPUs, mpi rank p will use device d = p % D.

@jinghu4
Copy link
Author

jinghu4 commented Nov 28, 2024

Yes. But what I have confused is that we I run

mpirun -n 2 myApplication

All 8 gpus on the node run has these two processes Id running.

Even when I use cudaSetDevice to assign rank 0 to gpu 0 and rank 1 to gpu 1.
I can still see two processes running on both rank 0 and rank1.

image

@pghysels
Copy link
Owner

Hmm, I'm not sure.
STRUMPACK calls cudaSetDevice , see here

cudaSetDevice(rank % devs);

this is called form the SparseSolver constructor. So perhaps that changes what you specify. But it should not use all GPUs. Maybe SLATE is doing that?

You could try to set the CUDA_VISIBLE_DEVICES environment variable. But you need to set it differently for each MPI rank.
You can do that by setting it in a small script, which you then run using mpirun, as explained here:
https://medium.com/@jeffrey_91423/binding-to-the-right-gpu-in-mpi-cuda-programs-263ac753d232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants