-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dynamic CUB dispatch for merge_sort #3525
Add dynamic CUB dispatch for merge_sort #3525
Conversation
🟨 CI finished in 1h 47m: Pass: 90%/90 | Total: 2d 15h | Avg: 42m 22s | Max: 1h 11m | Hits: 221%/12772
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
11 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
🟨 CI finished in 3h 31m: Pass: 90%/90 | Total: 2d 17h | Avg: 43m 27s | Max: 1h 18m | Hits: 192%/10928
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
11 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
…ct constexpr `BLOCK_THREADS` if available
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…NaderAlAwar/cccl into merge-sort-dynamic-cub-dispatch
/ok to test |
… to select constexpr `BLOCK_THREADS` if available" This reverts commit 7cced5e.
…here instead of the policy directly
/ok to test |
/ok to test |
🟨 CI finished in 37m 03s: Pass: 97%/89 | Total: 14h 09m | Avg: 9m 32s | Max: 33m 04s | Hits: 422%/10936
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 89)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
8 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
🟩 CI finished in 1h 46m: Pass: 100%/89 | Total: 1d 16h | Avg: 27m 23s | Max: 1h 00m | Hits: 398%/10936
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 89)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
8 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
@NaderAlAwar please also provide a SASS diff next time to be sure we did not mess up the code gen. This is usually done like this (in the build directory):
You should pick a test or benchmark according to your changes and make sure the I tested |
Description
closes #3387
Analogous to #2591 and #3398 , this PR extends the CUB dispatch layer for
merge_sort
to support dynamic kernel launching which will later be used by c.parallel.Checklist