-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STRUMPACK with > 1 GPU per node #108
Comments
You should run 1 MPI process per GPU (sounds like that is what you are trying to do). But if an MPI process sees multiple GPU devices, we do in STRUMPACK: What exactly do you mean with: |
I mean I am calling I'm not sure why this isn't working though. It seems like the strategy from inside of STRUMPACK to call |
Could it be due to MAGMA? Can you try without MAGMA? I think the code at this line is not executed when running without MAGMA: |
Hm, odd. Building without MAGMA the error message I get is:
which occurs outside of the STRUMPACK solve. Again, everything is totally fine with using SuperLU_DIST (which is using GPU) or a CPU-based direct solve. It seems like maybe STRUMPACK is corrupting memory somewhere? |
I'll see if I can find a machine with multiple GPUs per node. |
Awesome, thanks! For reference, I'm running on AWS, on a single p3.8xlarge EC2 instance (4 x V100 GPU). |
I can reproduce it on Perlmutter, using setup (1): https://docs.nersc.gov/systems/perlmutter/running-jobs/#1-node-4-tasks-4-gpus-all-gpus-visible-to-all-tasks The only difference between setups (1) and (2) is that for (1) STRUMPACK calls I did also notice that it runs fine with setup (1) when I add I will investigate further. You say it works with SuperLU. Did you set |
Awesome, thank you for your help with this and great that you can reproduce. No I did not set |
It works correctly with SuperLU with I can't figure out what is wrong. I know that calling Perhaps you can set |
I wonder if the issue is somehow an interplay between STRUMPACK and SLATE (I noticed SLATE also has calls to |
I also see the issue without linking with SLATE (or MAGMA). |
I'm having some issues running STRUMPACK with more than a single GPU per node. By comparison, SuperLU_DIST is fine. This is with CUDA. In particular, if I run with 2 MPI processes (not CUDA aware), where each process is assigned in my own code to devices 0 and 1, I get:
Is there anything special we need to do building STRUMPACK + MPI with CUDA support?
The text was updated successfully, but these errors were encountered: