Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add b200 tunings for histogram #3616

Merged
merged 1 commit into from
Jan 30, 2025
Merged

Conversation

bernhardmgruber
Copy link
Contributor

No description provided.

Copy link
Contributor

🟩 CI finished in 1h 13m: Pass: 100%/89 | Total: 15h 22m | Avg: 10m 21s | Max: 37m 25s | Hits: 418%/10896
  • 🟩 cub: Pass: 100%/44 | Total: 8h 13m | Avg: 11m 13s | Max: 32m 58s | Hits: 529%/3512

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  8h 01m | Avg: 11m 27s | Max: 32m 58s | Hits: 529%/3512  
      🟩 arm64              Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  6m 40s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 50m 18s | Avg: 10m 03s | Max: 25m 23s | Hits: 529%/878   
      🟩 12.5               Pass: 100%/2   | Total: 24m 18s | Avg: 12m 09s | Max: 12m 19s
      🟩 12.6               Pass: 100%/37  | Total:  6h 59m | Avg: 11m 19s | Max: 32m 58s | Hits: 529%/2634  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  5m 32s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 50m 18s | Avg: 10m 03s | Max: 25m 23s | Hits: 529%/878   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 24m 18s | Avg: 12m 09s | Max: 12m 19s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 48m | Avg: 11m 39s | Max: 32m 58s | Hits: 529%/2634  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  5m 32s
      🟩 nvcc               Pass: 100%/42  | Total:  8h 02m | Avg: 11m 29s | Max: 32m 58s | Hits: 529%/3512  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 26m 01s | Avg:  6m 30s | Max:  6m 42s
      🟩 Clang15            Pass: 100%/2   | Total: 13m 45s | Avg:  6m 52s | Max:  6m 54s
      🟩 Clang16            Pass: 100%/2   | Total: 13m 20s | Avg:  6m 40s | Max:  7m 02s
      🟩 Clang17            Pass: 100%/2   | Total: 13m 42s | Avg:  6m 51s | Max:  6m 55s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 13m | Avg: 10m 33s | Max: 22m 38s
      🟩 GCC7               Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max:  6m 53s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 11s | Avg:  6m 11s | Max:  6m 11s
      🟩 GCC9               Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  6m 35s
      🟩 GCC10              Pass: 100%/2   | Total: 12m 42s | Avg:  6m 21s | Max:  6m 27s
      🟩 GCC11              Pass: 100%/2   | Total: 13m 01s | Avg:  6m 30s | Max:  6m 31s
      🟩 GCC12              Pass: 100%/4   | Total: 45m 29s | Avg: 11m 22s | Max: 26m 09s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 50m | Avg: 13m 47s | Max: 25m 44s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 53m 17s | Avg: 26m 38s | Max: 27m 54s | Hits: 529%/1756  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 02m | Avg: 31m 07s | Max: 32m 58s | Hits: 529%/1756  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 24m 18s | Avg: 12m 09s | Max: 12m 19s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 20m | Avg:  8m 16s | Max: 22m 38s
      🟩 GCC                Pass: 100%/21  | Total:  3h 33m | Avg: 10m 09s | Max: 26m 09s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 55m | Avg: 28m 53s | Max: 32m 58s | Hits: 529%/3512  
      🟩 NVHPC              Pass: 100%/2   | Total: 24m 18s | Avg: 12m 09s | Max: 12m 19s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 10s | Avg: 15m 35s | Max: 26m 09s
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 22m | Avg: 17m 45s | Max: 25m 44s
      🟩 v100               Pass: 100%/34  | Total:  5h 20m | Avg:  9m 25s | Max: 32m 58s | Hits: 529%/3512  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 38m | Avg:  9m 09s | Max: 32m 58s | Hits: 529%/3512  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 37s | Avg: 21m 37s | Max: 21m 37s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 21s | Avg: 15m 21s | Max: 15m 21s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 50s | Max: 26m 09s
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 49s | Avg: 21m 54s | Max: 22m 08s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 10s | Avg: 15m 35s | Max: 26m 09s
      🟩 90a                Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 18m | Avg:  9m 55s | Max: 29m 17s | Hits: 529%/2634  
      🟩 20                 Pass: 100%/24  | Total:  4h 55m | Avg: 12m 18s | Max: 32m 58s | Hits: 529%/878   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 6h 33m | Avg: 9m 22s | Max: 37m 25s | Hits: 365%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 33s | Avg:  8m 16s | Max: 10m 38s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  6h 23m | Avg:  9m 35s | Max: 37m 25s | Hits: 365%/7384  
      🟩 arm64              Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 09s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 46m 41s | Avg:  9m 20s | Max: 25m 12s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 29m 20s | Avg: 14m 40s | Max: 14m 52s
      🟩 12.6               Pass: 100%/35  | Total:  5h 17m | Avg:  9m 04s | Max: 37m 25s | Hits: 365%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  5m 59s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 46m 41s | Avg:  9m 20s | Max: 25m 12s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 20s | Avg: 14m 40s | Max: 14m 52s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  5h 06m | Avg:  9m 16s | Max: 37m 25s | Hits: 365%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  5m 59s
      🟩 nvcc               Pass: 100%/40  | Total:  6h 22m | Avg:  9m 33s | Max: 37m 25s | Hits: 365%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 32s | Avg:  5m 23s | Max:  5m 32s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 39s | Avg:  5m 49s | Max:  6m 00s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 45s | Avg:  5m 52s | Max:  5m 53s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 37s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 51s | Avg:  6m 24s | Max:  9m 58s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 29s | Avg:  5m 44s | Max:  5m 57s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  6m 08s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 45s
      🟩 GCC11              Pass: 100%/2   | Total: 43m 32s | Avg: 21m 46s | Max: 37m 25s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 26s
      🟩 GCC13              Pass: 100%/8   | Total: 56m 55s | Avg:  7m 06s | Max: 11m 16s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 53m 54s | Avg: 26m 57s | Max: 28m 42s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 27s | Avg: 28m 13s | Max: 29m 37s | Hits: 365%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 20s | Avg: 14m 40s | Max: 14m 52s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 40m | Avg:  5m 55s | Max:  9m 58s
      🟩 GCC                Pass: 100%/19  | Total:  2h 32m | Avg:  8m 02s | Max: 37m 25s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 50m | Avg: 27m 35s | Max: 29m 37s | Hits: 365%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 20s | Avg: 14m 40s | Max: 14m 52s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 04m | Avg:  8m 05s | Max: 11m 16s
      🟩 v100               Pass: 100%/34  | Total:  5h 28m | Avg:  9m 40s | Max: 37m 25s | Hits: 365%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 46m | Avg:  9m 21s | Max: 37m 25s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 13s | Avg:  7m 36s | Max:  7m 40s
      🟩 TestGPU            Pass: 100%/3   | Total: 31m 52s | Avg: 10m 37s | Max: 11m 16s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 37m | Avg: 10m 53s | Max: 37m 25s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/20  | Total:  2h 38m | Avg:  7m 56s | Max: 29m 37s | Hits: 365%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 03s | Avg: 4m 01s | Max: 6m 01s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  6m 01s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 02s | Avg:  2m 02s | Max:  2m 02s
      🟩 Test               Pass: 100%/1   | Total:  6m 01s | Avg:  6m 01s | Max:  6m 01s
    
  • 🟩 python: Pass: 100%/1 | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 26m 44s | Avg: 26m 44s | Max: 26m 44s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@miscco miscco merged commit 51a890a into NVIDIA:main Jan 30, 2025
103 of 107 checks passed
Copy link
Contributor

Git push to origin failed for branch/2.8.x with exitcode 128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants