Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add b200 tunings for reduce.by_key #3610

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

bernhardmgruber
Copy link
Contributor

No description provided.

Copy link
Contributor

🟩 CI finished in 2h 11m: Pass: 100%/89 | Total: 2d 15h | Avg: 43m 06s | Max: 1h 14m | Hits: 180%/10936
  • 🟩 cub: Pass: 100%/44 | Total: 1d 15h | Avg: 54m 30s | Max: 1h 14m | Hits: 180%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 13h | Avg: 53m 57s | Max:  1h 14m | Hits: 180%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 09m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 04m | Hits: 159%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
      🟩 12.6               Pass: 100%/37  | Total:  1d 08h | Avg: 52m 33s | Max:  1h 14m | Hits: 187%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 00m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 04m | Hits: 159%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1d 06h | Avg: 52m 10s | Max:  1h 14m | Hits: 187%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 00m
      🟩 nvcc               Pass: 100%/42  | Total:  1d 13h | Avg: 54m 15s | Max:  1h 14m | Hits: 180%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 57m | Avg: 59m 25s | Max:  1h 02m
      🟩 Clang15            Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m
      🟩 Clang16            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 18s | Max: 58m 54s
      🟩 Clang17            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 02m
      🟩 Clang18            Pass: 100%/7   | Total:  5h 50m | Avg: 50m 05s | Max:  1h 04m
      🟩 GCC7               Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 04m
      🟩 GCC8               Pass: 100%/1   | Total: 54m 51s | Avg: 54m 51s | Max: 54m 51s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 36s | Max:  1h 04m
      🟩 GCC10              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m
      🟩 GCC11              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 39s | Max: 56m 48s
      🟩 GCC12              Pass: 100%/4   | Total:  3h 02m | Avg: 45m 44s | Max:  1h 04m
      🟩 GCC13              Pass: 100%/8   | Total:  5h 09m | Avg: 38m 38s | Max:  1h 09m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 13m | Hits: 199%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 27m | Avg:  1h 13m | Max:  1h 14m | Hits: 162%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 46m | Avg: 55m 41s | Max:  1h 04m
      🟩 GCC                Pass: 100%/21  | Total: 17h 06m | Avg: 48m 53s | Max:  1h 09m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 42m | Avg:  1h 10m | Max:  1h 14m | Hits: 180%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 54m 44s | Avg: 27m 22s | Max: 29m 25s
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 21m | Avg: 32m 40s | Max:  1h 04m
      🟩 v100               Pass: 100%/34  | Total:  1d 10h | Avg:  1h 01m | Max:  1h 14m | Hits: 180%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 13h | Avg:  1h 00m | Max:  1h 14m | Hits: 180%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 14s | Avg: 22m 14s | Max: 22m 14s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 03s | Avg: 17m 03s | Max: 17m 03s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 21m | Avg: 27m 12s | Max: 29m 25s
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 17s | Avg: 21m 38s | Max: 21m 54s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 54m 44s | Avg: 27m 22s | Max: 29m 25s
      🟩 90a                Pass: 100%/1   | Total: 28m 17s | Avg: 28m 17s | Max: 28m 17s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 41m | Avg:  1h 02m | Max:  1h 13m | Hits: 188%/2664  
      🟩 20                 Pass: 100%/24  | Total: 19h 16m | Avg: 48m 10s | Max:  1h 14m | Hits: 158%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 23h 12m | Avg: 33m 09s | Max: 1h 08m | Hits: 179%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 36m 53s | Avg: 18m 26s | Max: 26m 18s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 22h 12m | Avg: 33m 18s | Max:  1h 08m | Hits: 179%/7384  
      🟩 arm64              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 13s | Max: 31m 53s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 04m | Avg: 36m 56s | Max: 50m 56s | Hits: 177%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  1h 47m | Avg: 53m 45s | Max: 55m 35s
      🟩 12.6               Pass: 100%/35  | Total: 18h 20m | Avg: 31m 26s | Max:  1h 08m | Hits: 180%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 00m | Avg: 30m 10s | Max: 31m 14s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 04m | Avg: 36m 56s | Max: 50m 56s | Hits: 177%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 47m | Avg: 53m 45s | Max: 55m 35s
      🟩 nvcc12.6           Pass: 100%/33  | Total: 17h 20m | Avg: 31m 31s | Max:  1h 08m | Hits: 180%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 10s | Max: 31m 14s
      🟩 nvcc               Pass: 100%/40  | Total: 22h 12m | Avg: 33m 18s | Max:  1h 08m | Hits: 179%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 27s | Max: 33m 37s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 21s | Max: 33m 00s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 14s | Max: 33m 23s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 29s | Max: 32m 48s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 47m | Avg: 23m 53s | Max: 31m 14s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 35s | Max: 33m 54s
      🟩 GCC8               Pass: 100%/1   | Total: 34m 54s | Avg: 34m 54s | Max: 34m 54s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 18s | Max: 35m 39s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 12m | Avg: 36m 12s | Max: 36m 34s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 15s | Max: 36m 52s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 02s | Max: 36m 30s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 57m | Avg: 22m 08s | Max: 37m 32s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 39s | Max:  1h 00m | Hits: 177%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 08m | Hits: 181%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 47m | Avg: 53m 45s | Max: 55m 35s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 05m | Avg: 28m 32s | Max: 33m 37s
      🟩 GCC                Pass: 100%/19  | Total:  9h 20m | Avg: 29m 31s | Max: 37m 32s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 59m | Avg: 59m 49s | Max:  1h 08m | Hits: 179%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 47m | Avg: 53m 45s | Max: 55m 35s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  2h 21m | Avg: 17m 38s | Max: 37m 32s
      🟩 v100               Pass: 100%/34  | Total: 20h 51m | Avg: 36m 49s | Max:  1h 08m | Hits: 179%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 22h 24m | Avg: 36m 20s | Max:  1h 08m | Hits: 179%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 16m 07s | Avg:  8m 03s | Max:  8m 08s
      🟩 TestGPU            Pass: 100%/3   | Total: 31m 53s | Avg: 10m 37s | Max: 10m 57s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 27s | Avg: 20m 27s | Max: 20m 27s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 36m | Avg: 37m 49s | Max:  1h 00m | Hits: 177%/5538  
      🟩 20                 Pass: 100%/20  | Total:  9h 59m | Avg: 29m 58s | Max:  1h 08m | Hits: 186%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 22s | Avg: 5m 41s | Max: 9m 14s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  9m 14s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s
      🟩 Test               Pass: 100%/1   | Total:  9m 14s | Avg:  9m 14s | Max:  9m 14s
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 49s | Avg: 34m 49s | Max: 34m 49s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@miscco miscco merged commit bac4e75 into NVIDIA:main Jan 31, 2025
104 of 108 checks passed
Copy link
Contributor

Git push to origin failed for branch/2.8.x with exitcode 128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants