-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce cuda.cooperative overloads not requiring temporary storage #2528
Introduce cuda.cooperative overloads not requiring temporary storage #2528
Conversation
🟩 CI finished in 14m 16s: Pass: 100%/1 | Total: 14m 16s | Avg: 14m 16s | Max: 14m 16s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | pycuda |
CCCL C Parallel Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | pycuda |
CCCL C Parallel Library |
🏃 Runner counts (total jobs: 1)
# | Runner |
---|---|
1 | linux-amd64-gpu-v100-latest-1 |
A few things to consider before merging: a) sync is required before subsequent calls, which is not obvious, so we might need to add a sync inside the call |
python/cuda_cooperative/cuda/cooperative/experimental/_types.py
Outdated
Show resolved
Hide resolved
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
🟩 CI finished in 23m 47s: Pass: 100%/1 | Total: 23m 47s | Avg: 23m 47s | Max: 23m 47s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 1)
# | Runner |
---|---|
1 | linux-amd64-gpu-v100-latest-1 |
…VIDIA#2528) * Modernize pkg resource query * Add cooperative overloads without shared memory * Start fixing temp storage * Incorporate template params into mangling * Condence dict access * Fix temporary storage indexing for sub hw waprs * Test multiple warps * Disable alloc API for sub hw warps
…VIDIA#2528) * Modernize pkg resource query * Add cooperative overloads without shared memory * Start fixing temp storage * Incorporate template params into mangling * Condence dict access * Fix temporary storage indexing for sub hw waprs * Test multiple warps * Disable alloc API for sub hw warps
Description
closes #2527
This PR introduces versions of cooperative algorithms that do not require temporary storage. This is a quick fix for temporary storage alignment issues when having more than one shared memory array.
Checklist