Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reduce tunings with no benefit #3724

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

bernhardmgruber
Copy link
Contributor

Follow up to #3612 where the verification benchmark showed that several selected tunings did not show significant improvements in the end. This PR removes those tunings.

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner February 6, 2025 19:37
@gonidelis gonidelis self-requested a review February 6, 2025 19:49
@gonidelis
Copy link
Member

I second that

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see red I hit approve

Copy link
Contributor

github-actions bot commented Feb 6, 2025

🟩 CI finished in 1h 38m: Pass: 100%/90 | Total: 2d 15h | Avg: 42m 20s | Max: 1h 16m | Hits: 74%/132225
  • 🟩 cub: Pass: 100%/44 | Total: 1d 15h | Avg: 53m 54s | Max: 1h 16m | Hits: 69%/52320

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 13h | Avg: 53m 39s | Max:  1h 16m | Hits:  69%/49888 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 12s | Max:  1h 00m | Hits:  67%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 51m | Avg: 58m 19s | Max:  1h 00m | Hits:  58%/5914  
      🟩 12.5               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/2250  
      🟩 12.8               Pass: 100%/37  | Total:  1d 08h | Avg: 52m 45s | Max:  1h 16m | Hits:  70%/44156 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 59m | Avg: 59m 31s | Max:  1h 00m | Hits:  73%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 51m | Avg: 58m 19s | Max:  1h 00m | Hits:  58%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/2250  
      🟩 nvcc12.8           Pass: 100%/35  | Total:  1d 06h | Avg: 52m 22s | Max:  1h 16m | Hits:  70%/42052 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 31s | Max:  1h 00m | Hits:  73%/2104  
      🟩 nvcc               Pass: 100%/42  | Total:  1d 13h | Avg: 53m 38s | Max:  1h 16m | Hits:  68%/50216 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 47m | Avg: 56m 59s | Max:  1h 00m | Hits:  68%/4872  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 48s | Max: 59m 45s | Hits:  68%/2432  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 50m | Avg: 55m 23s | Max: 55m 44s | Hits:  68%/2432  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m | Hits:  68%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 48m | Avg: 49m 48s | Max:  1h 02m | Hits:  79%/8184  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 34s | Max:  1h 00m | Hits:  67%/2436  
      🟩 GCC8               Pass: 100%/1   | Total: 56m 44s | Avg: 56m 44s | Max: 56m 44s | Hits:  67%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 12s | Max: 58m 37s | Hits:  67%/2436  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 01s | Max:  1h 01m | Hits:  67%/2436  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 11s | Max:  1h 01m | Hits:  67%/2432  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  67%/2432  
      🟩 GCC13              Pass: 100%/10  | Total:  6h 18m | Avg: 37m 52s | Max:  1h 07m | Hits:  83%/12160 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 16m | Hits:  14%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 12m | Hits:  14%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 26m | Avg: 54m 31s | Max:  1h 02m | Hits:  72%/20352 
      🟩 GCC                Pass: 100%/21  | Total: 17h 15m | Avg: 49m 18s | Max:  1h 07m | Hits:  75%/25550 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 41m | Avg:  1h 10m | Max:  1h 16m | Hits:  14%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 49m 59s | Avg: 24m 59s | Max: 25m 41s | Hits:  83%/2432  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 16m | Hits:  62%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 09m | Avg: 31m 12s | Max:  1h 02m | Hits:  91%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 59s | Max:  1h 16m | Hits:  63%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 02s | Avg: 20m 02s | Max: 20m 02s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 57s | Avg: 16m 57s | Max: 16m 57s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 54s | Max: 24m 18s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 51s | Avg: 21m 55s | Max: 22m 41s | Hits:  99%/2432  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 49m 59s | Avg: 24m 59s | Max: 25m 41s | Hits:  83%/2432  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits:  67%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 16m | Avg:  1h 00m | Max:  1h 16m | Hits:  61%/23559 
      🟩 20                 Pass: 100%/24  | Total: 19h 15m | Avg: 48m 09s | Max:  1h 12m | Hits:  75%/28761 
    
  • 🟩 thrust: Pass: 100%/43 | Total: 23h 20m | Avg: 32m 34s | Max: 59m 53s | Hits: 78%/79625

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 01s | Avg: 18m 30s | Max: 25m 37s | Hits:  89%/3706  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 22h 22m | Avg: 32m 45s | Max: 59m 53s | Hits:  78%/75920 
      🟩 arm64              Pass: 100%/2   | Total: 58m 06s | Avg: 29m 03s | Max: 30m 35s | Hits:  78%/3705  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 06m | Avg: 37m 15s | Max: 53m 06s | Hits:  73%/9256  
      🟩 12.5               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 12s | Max: 54m 51s | Hits:  73%/3704  
      🟩 12.8               Pass: 100%/36  | Total: 18h 26m | Avg: 30m 43s | Max: 59m 53s | Hits:  79%/66665 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 54m 21s | Avg: 27m 10s | Max: 27m 40s | Hits:  78%/3704  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 06m | Avg: 37m 15s | Max: 53m 06s | Hits:  73%/9256  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 48m | Avg: 54m 12s | Max: 54m 51s | Hits:  73%/3704  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 17h 31m | Avg: 30m 56s | Max: 59m 53s | Hits:  79%/62961 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 21s | Avg: 27m 10s | Max: 27m 40s | Hits:  78%/3704  
      🟩 nvcc               Pass: 100%/41  | Total: 22h 26m | Avg: 32m 50s | Max: 59m 53s | Hits:  78%/75921 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 50s | Max: 32m 35s | Hits:  78%/7408  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 08s | Max: 33m 02s | Hits:  78%/3704  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 43s | Max: 32m 02s | Hits:  78%/3704  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 52s | Max: 33m 30s | Hits:  78%/3704  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 44m | Avg: 23m 25s | Max: 32m 06s | Hits:  84%/12964 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 33s | Max: 33m 29s | Hits:  78%/3706  
      🟩 GCC8               Pass: 100%/1   | Total: 29m 54s | Avg: 29m 54s | Max: 29m 54s | Hits:  78%/1853  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 05s | Max: 36m 15s | Hits:  78%/3706  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 25s | Max: 35m 19s | Hits:  78%/3706  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 48s | Max: 32m 01s | Hits:  78%/3706  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 45s | Max: 34m 24s | Hits:  78%/3706  
      🟩 GCC13              Pass: 100%/8   | Total:  3h 12m | Avg: 24m 06s | Max: 36m 52s | Hits:  86%/14824 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 47m | Avg: 53m 36s | Max: 54m 06s | Hits:  53%/3692  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 30m | Avg: 50m 10s | Max: 59m 53s | Hits:  58%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 48m | Avg: 54m 12s | Max: 54m 51s | Hits:  73%/3704  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 00m | Avg: 28m 17s | Max: 33m 30s | Hits:  81%/31484 
      🟩 GCC                Pass: 100%/19  | Total:  9h 13m | Avg: 29m 09s | Max: 36m 52s | Hits:  82%/35207 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 17m | Avg: 51m 32s | Max: 59m 53s | Hits:  56%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 12s | Max: 54m 51s | Hits:  73%/3704  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 23m | Avg: 35m 15s | Max: 57m 16s | Hits:  76%/61112 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 57m | Avg: 23m 44s | Max: 59m 53s | Hits:  85%/18513 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 21h 57m | Avg: 35m 37s | Max: 59m 53s | Hits:  75%/68516 
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 23s | Avg: 16m 27s | Max: 33m 23s | Hits:  89%/5551  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 41s | Avg: 11m 13s | Max: 11m 45s | Hits:  99%/5558  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s | Hits:  78%/1853  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 09m | Avg: 36m 27s | Max: 57m 16s | Hits:  74%/37031 
      🟩 20                 Pass: 100%/21  | Total: 10h 34m | Avg: 30m 13s | Max: 59m 53s | Hits:  80%/38888 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 00s | Avg: 4m 00s | Max: 5m 27s | Hits: 98%/280

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  5m 27s | Hits:  98%/280   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 33s | Avg:  2m 33s | Max:  2m 33s | Hits:  97%/140   
      🟩 Test               Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s | Hits:  98%/140   
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber merged commit 3cd4d9e into NVIDIA:main Feb 7, 2025
107 of 109 checks passed
Copy link
Contributor

github-actions bot commented Feb 7, 2025

Backport failed for branch/2.8.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin branch/2.8.x
git worktree add -d .worktree/backport-3724-to-branch/2.8.x origin/branch/2.8.x
cd .worktree/backport-3724-to-branch/2.8.x
git switch --create backport-3724-to-branch/2.8.x
git cherry-pick -x 3cd4d9edfc57a9e9afcab7bfe183b00c5163a9a5

bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this pull request Feb 7, 2025
@bernhardmgruber bernhardmgruber deleted the reduce_fix branch February 7, 2025 08:52
bernhardmgruber added a commit that referenced this pull request Feb 7, 2025
* Add b200 policies for reduce (#3612)
* Add b200 policies for cub.device.reduce.sum
* Add b200 policies for reduce.min
Co-authored-by: Giannis Gonidelis <[email protected]>

* Remove reduce tunings with no benefit (#3724)
Co-authored-by: Giannis Gonidelis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants