-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SM100 histogram tunings #3691
Fix SM100 histogram tunings #3691
Conversation
🟩 CI finished in 1h 42m: Pass: 100%/90 | Total: 2d 16h | Avg: 43m 05s | Max: 1h 16m | Hits: 214%/13398
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
Please don't merge until we have a perf diff from @gonidelis |
|
preliminary
|
From looking at the code, we only provide a tuning for |
From the benchmark results it seems that also only the |
7623d12
to
2699952
Compare
🟩 CI finished in 1h 38m: Pass: 100%/90 | Total: 2d 19h | Avg: 44m 40s | Max: 1h 19m | Hits: 56%/132225
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
@gonidelis please provide multi histogram benchmarks as well. Thx! |
98eae95
to
e28fed2
Compare
The tuning data member names did not match the one used when selecting tunings, so all SM100 tunings were SFINAE-ed out.
e28fed2
to
30f5b21
Compare
🟩 CI finished in 1h 10m: Pass: 100%/90 | Total: 1d 16h | Avg: 27m 18s | Max: 1h 04m | Hits: 90%/132225
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin branch/2.8.x
git worktree add -d .worktree/backport-3691-to-branch/2.8.x origin/branch/2.8.x
cd .worktree/backport-3691-to-branch/2.8.x
git switch --create backport-3691-to-branch/2.8.x
git cherry-pick -x e7aae03124f20d8a4783d3e1668307d4a9e3bb8b |
The tuning data member names did not match the one used when selecting tunings, so all SM100 tunings were SFINAE-ed out. Also drop tunings with no benefit.
The tuning data member names did not match the one used when selecting tunings, so all SM100 tunings were SFINAE-ed out. Also drop tunings with no benefit.
* Add b200 tunings for histogram (#3616) Co-authored-by: Giannis Gonidelis <[email protected]> * Fix SM100 histogram tunings (#3691) The tuning data member names did not match the one used when selecting tunings, so all SM100 tunings were SFINAE-ed out. Also drop tunings with no benefit. --------- Co-authored-by: Giannis Gonidelis <[email protected]>
|
The tuning data member names did not match the one used when selecting tunings, so all SM100 tunings were SFINAE-ed out.
cub.bench.radix_sort.pairs.base
is empty (no side effect)cub.bench.histogram.even.base
contains instruction changes (tunings have effect)