Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MGPU downstream updates #2424

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

1tnguyen
Copy link
Collaborator

@1tnguyen 1tnguyen commented Nov 27, 2024

Description

Update mgpu sha to incorporate (see mgpu MR!34):

  • Fixes for multi-node NVLink state vector distribution.

  • Fixes for integer overflow in certain scenarios.

  • Build script updates for mgpu backends, e.g., include a new dependency (CUDA driver library for multi-node NVLink support), better libcustatevec detection (Implement a FIXME in GitLab pipeline #2433).

For CUDA-Q build: as the new mgpu backends' .so files will depend on CUDA Driver library (libcuda.so.1), we exclude it from the wheel packaging (similar to other NVIDIA dependencies).
Note: libcuda.so.1 comes with the NVIDIA driver (i.e., should be available on systems with GPU).

Also, update docs for CUDAQ_HOST_DEVICE_MIGRATION_LEVEL.

TODO Checklist:

  • Update MGPU SHA
  • Check Publishing pipeline

1tnguyen and others added 4 commits November 26, 2024 23:57
As the new libs will depend on runtime libcuda.so.1, we exclude it from
the wheel packaging.

Note: libcuda.so.1 comes with the NVIDIA driver (i.e., should be
available on systems with GPU).

Signed-off-by: Thien Nguyen <[email protected]>
Signed-off-by: Thien Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant