Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some CUB tunings to dedicated headers #3096

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Dec 9, 2024

Move the policy hubs of

  • adjacent_difference
  • batch_memcpy
  • radix_sort

to dedicated headers.

Fixes a part of #3097

@bernhardmgruber bernhardmgruber marked this pull request as ready for review December 9, 2024 17:05
@bernhardmgruber bernhardmgruber requested review from a team as code owners December 9, 2024 17:05
Move the policy hubs of
* adjacent_difference
* batch_memcpy
* dispatch_radix_sort
to dedicated headers.
Copy link
Contributor

🟩 CI finished in 1h 11m: Pass: 100%/94 | Total: 1d 19h | Avg: 27m 28s | Max: 56m 42s | Hits: 92%/12324
  • 🟩 thrust: Pass: 100%/46 | Total: 11h 11m | Avg: 14m 35s | Max: 29m 50s | Hits: 94%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 28m 42s | Avg: 14m 21s | Max: 16m 19s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 10h 47m | Avg: 14m 42s | Max: 29m 50s | Hits:  94%/9260  
      🟩 arm64              Pass: 100%/2   | Total: 23m 49s | Avg: 11m 54s | Max: 12m 37s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  1h 35m | Avg: 13m 42s | Max: 28m 50s | Hits:  93%/1852  
      🟩 12.5               Pass: 100%/2   | Total: 52m 24s | Avg: 26m 12s | Max: 27m 09s
      🟩 12.6               Pass: 100%/37  | Total:  8h 42m | Avg: 14m 07s | Max: 29m 50s | Hits:  95%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 23m 39s | Avg: 11m 49s | Max: 11m 51s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 35m | Avg: 13m 42s | Max: 28m 50s | Hits:  93%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 52m 24s | Avg: 26m 12s | Max: 27m 09s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  8h 19m | Avg: 14m 15s | Max: 29m 50s | Hits:  95%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 23m 39s | Avg: 11m 49s | Max: 11m 51s
      🟩 nvcc               Pass: 100%/44  | Total: 10h 47m | Avg: 14m 42s | Max: 29m 50s | Hits:  94%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 49m 06s | Avg: 12m 16s | Max: 14m 28s
      🟩 Clang10            Pass: 100%/1   | Total: 15m 04s | Avg: 15m 04s | Max: 15m 04s
      🟩 Clang11            Pass: 100%/1   | Total: 11m 48s | Avg: 11m 48s | Max: 11m 48s
      🟩 Clang12            Pass: 100%/1   | Total: 12m 50s | Avg: 12m 50s | Max: 12m 50s
      🟩 Clang13            Pass: 100%/1   | Total: 12m 08s | Avg: 12m 08s | Max: 12m 08s
      🟩 Clang14            Pass: 100%/1   | Total: 11m 48s | Avg: 11m 48s | Max: 11m 48s
      🟩 Clang15            Pass: 100%/1   | Total: 12m 29s | Avg: 12m 29s | Max: 12m 29s
      🟩 Clang16            Pass: 100%/1   | Total: 12m 14s | Avg: 12m 14s | Max: 12m 14s
      🟩 Clang17            Pass: 100%/1   | Total: 12m 39s | Avg: 12m 39s | Max: 12m 39s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 18m | Avg: 11m 12s | Max: 12m 14s
      🟩 GCC6               Pass: 100%/2   | Total: 22m 13s | Avg: 11m 06s | Max: 11m 56s
      🟩 GCC7               Pass: 100%/2   | Total: 22m 56s | Avg: 11m 28s | Max: 11m 50s
      🟩 GCC8               Pass: 100%/1   | Total: 13m 03s | Avg: 13m 03s | Max: 13m 03s
      🟩 GCC9               Pass: 100%/3   | Total: 35m 52s | Avg: 11m 57s | Max: 14m 10s
      🟩 GCC10              Pass: 100%/1   | Total: 12m 54s | Avg: 12m 54s | Max: 12m 54s
      🟩 GCC11              Pass: 100%/1   | Total: 12m 28s | Avg: 12m 28s | Max: 12m 28s
      🟩 GCC12              Pass: 100%/1   | Total: 14m 02s | Avg: 14m 02s | Max: 14m 02s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 42m | Avg: 12m 45s | Max: 16m 25s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 17m 14s | Avg: 17m 14s | Max: 17m 14s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 28m 50s | Avg: 28m 50s | Max: 28m 50s | Hits:  93%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 25m 19s | Avg: 25m 19s | Max: 25m 19s | Hits:  93%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 42s | Max: 29m 50s | Hits:  95%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 52m 24s | Avg: 26m 12s | Max: 27m 09s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  3h 48m | Avg: 12m 01s | Max: 15m 04s
      🟩 GCC                Pass: 100%/19  | Total:  3h 55m | Avg: 12m 23s | Max: 16m 25s
      🟩 Intel              Pass: 100%/1   | Total: 17m 14s | Avg: 17m 14s | Max: 17m 14s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 17m | Avg: 27m 27s | Max: 29m 50s | Hits:  94%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total: 52m 24s | Avg: 26m 12s | Max: 27m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 11h 11m | Avg: 14m 35s | Max: 29m 50s | Hits:  94%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  9h 46m | Avg: 14m 39s | Max: 29m 50s | Hits:  93%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 39m 24s | Avg: 13m 08s | Max: 23m 55s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 44m 58s | Avg: 14m 59s | Max: 16m 25s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  8m 03s | Avg:  8m 03s | Max:  8m 03s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 54m 02s | Avg: 10m 48s | Max: 11m 29s
      🟩 14                 Pass: 100%/4   | Total:  1h 07m | Avg: 16m 46s | Max: 28m 50s | Hits:  93%/1852  
      🟩 17                 Pass: 100%/12  | Total:  3h 22m | Avg: 16m 51s | Max: 29m 21s | Hits:  93%/3704  
      🟩 20                 Pass: 100%/23  | Total:  5h 18m | Avg: 13m 51s | Max: 29m 50s | Hits:  96%/3704  
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 07h | Avg: 41m 40s | Max: 56m 42s | Hits: 87%/3064

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 05h | Avg: 41m 22s | Max: 56m 42s | Hits:  87%/3064  
      🟩 arm64              Pass: 100%/2   | Total:  1h 36m | Avg: 48m 24s | Max: 48m 58s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  4h 52m | Avg: 41m 50s | Max: 48m 57s | Hits:  87%/766   
      🟩 12.5               Pass: 100%/2   | Total:  1h 39m | Avg: 49m 54s | Max: 51m 24s
      🟩 12.6               Pass: 100%/36  | Total:  1d 00h | Avg: 41m 11s | Max: 56m 42s | Hits:  87%/2298  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max: 56m 42s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  4h 52m | Avg: 41m 50s | Max: 48m 57s | Hits:  87%/766   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 39m | Avg: 49m 54s | Max: 51m 24s
      🟩 nvcc12.6           Pass: 100%/34  | Total: 22h 49m | Avg: 40m 17s | Max: 55m 53s | Hits:  87%/2298  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max: 56m 42s
      🟩 nvcc               Pass: 100%/43  | Total:  1d 05h | Avg: 40m 59s | Max: 55m 53s | Hits:  87%/3064  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  2h 55m | Avg: 43m 46s | Max: 48m 57s
      🟩 Clang10            Pass: 100%/1   | Total: 47m 58s | Avg: 47m 58s | Max: 47m 58s
      🟩 Clang11            Pass: 100%/1   | Total: 40m 55s | Avg: 40m 55s | Max: 40m 55s
      🟩 Clang12            Pass: 100%/1   | Total: 46m 49s | Avg: 46m 49s | Max: 46m 49s
      🟩 Clang13            Pass: 100%/1   | Total: 45m 18s | Avg: 45m 18s | Max: 45m 18s
      🟩 Clang14            Pass: 100%/1   | Total: 47m 29s | Avg: 47m 29s | Max: 47m 29s
      🟩 Clang15            Pass: 100%/1   | Total: 43m 40s | Avg: 43m 40s | Max: 43m 40s
      🟩 Clang16            Pass: 100%/1   | Total: 46m 29s | Avg: 46m 29s | Max: 46m 29s
      🟩 Clang17            Pass: 100%/1   | Total: 47m 12s | Avg: 47m 12s | Max: 47m 12s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 49m | Avg: 41m 17s | Max: 56m 42s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 14m | Avg: 37m 21s | Max: 38m 24s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 24m | Avg: 42m 02s | Max: 43m 53s
      🟩 GCC8               Pass: 100%/1   | Total: 42m 29s | Avg: 42m 29s | Max: 42m 29s
      🟩 GCC9               Pass: 100%/3   | Total:  2h 06m | Avg: 42m 04s | Max: 46m 41s
      🟩 GCC10              Pass: 100%/1   | Total: 46m 32s | Avg: 46m 32s | Max: 46m 32s
      🟩 GCC11              Pass: 100%/1   | Total: 44m 32s | Avg: 44m 32s | Max: 44m 32s
      🟩 GCC12              Pass: 100%/1   | Total: 42m 51s | Avg: 42m 51s | Max: 42m 51s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 47m | Avg: 28m 25s | Max: 48m 58s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 48m 07s | Avg: 48m 07s | Max: 48m 07s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 51s | Avg: 48m 51s | Max: 48m 51s | Hits:  87%/766   
      🟩 MSVC14.29          Pass: 100%/1   | Total: 51m 30s | Avg: 51m 30s | Max: 51m 30s | Hits:  87%/766   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 48m | Avg: 54m 19s | Max: 55m 53s | Hits:  87%/1532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 54s | Max: 51m 24s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 13h 49m | Avg: 43m 40s | Max: 56m 42s
      🟩 GCC                Pass: 100%/19  | Total: 11h 28m | Avg: 36m 15s | Max: 48m 58s
      🟩 Intel              Pass: 100%/1   | Total: 48m 07s | Avg: 48m 07s | Max: 48m 07s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 29m | Avg: 52m 15s | Max: 55m 53s | Hits:  87%/3064  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 39m | Avg: 49m 54s | Max: 51m 24s
    🟩 gpu
      🟩 v100               Pass: 100%/45  | Total:  1d 07h | Avg: 41m 40s | Max: 56m 42s | Hits:  87%/3064  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 05h | Avg: 45m 05s | Max: 56m 42s | Hits:  87%/3064  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 13s | Avg: 18m 13s | Max: 18m 13s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 11s | Avg: 15m 11s | Max: 15m 11s
      🟩 HostLaunch         Pass: 100%/2   | Total: 39m 39s | Avg: 19m 49s | Max: 20m 50s
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 57s | Avg: 21m 58s | Max: 22m 32s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 48s | Avg: 18m 48s | Max: 18m 48s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  3h 23m | Avg: 40m 45s | Max: 48m 57s
      🟩 14                 Pass: 100%/4   | Total:  2h 55m | Avg: 43m 51s | Max: 48m 51s | Hits:  87%/766   
      🟩 17                 Pass: 100%/12  | Total:  9h 28m | Avg: 47m 23s | Max: 56m 42s | Hits:  87%/1532  
      🟩 20                 Pass: 100%/24  | Total: 15h 27m | Avg: 38m 39s | Max: 56m 34s | Hits:  87%/766   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 15s | Avg: 4m 37s | Max: 7m 19s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 19s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 56s | Avg:  1m 56s | Max:  1m 56s
      🟩 Test               Pass: 100%/1   | Total:  7m 19s | Avg:  7m 19s | Max:  7m 19s
    
  • 🟩 python: Pass: 100%/1 | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 26m 01s | Avg: 26m 01s | Max: 26m 01s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 94)

# Runner
70 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16

@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Dec 10, 2024
Copy link
Contributor

🟩 CI finished in 54m 51s: Pass: 100%/94 | Total: 13h 48m | Avg: 8m 48s | Max: 30m 00s | Hits: 98%/12324
  • 🟩 thrust: Pass: 100%/46 | Total: 6h 22m | Avg: 8m 18s | Max: 24m 44s | Hits: 99%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 19m 53s | Avg:  9m 56s | Max: 13m 16s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  6h 10m | Avg:  8m 25s | Max: 24m 44s | Hits:  99%/9260  
      🟩 arm64              Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 38s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total: 52m 38s | Avg:  7m 31s | Max: 24m 44s | Hits:  98%/1852  
      🟩 12.5               Pass: 100%/2   | Total: 28m 15s | Avg: 14m 07s | Max: 14m 09s
      🟩 12.6               Pass: 100%/37  | Total:  5h 01m | Avg:  8m 08s | Max: 22m 47s | Hits:  99%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 34s
      🟩 nvcc11.1           Pass: 100%/7   | Total: 52m 38s | Avg:  7m 31s | Max: 24m 44s | Hits:  98%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 15s | Avg: 14m 07s | Max: 14m 09s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  4h 50m | Avg:  8m 17s | Max: 22m 47s | Hits:  99%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 34s
      🟩 nvcc               Pass: 100%/44  | Total:  6h 11m | Avg:  8m 26s | Max: 24m 44s | Hits:  99%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 23m 42s | Avg:  5m 55s | Max:  7m 19s
      🟩 Clang10            Pass: 100%/1   | Total:  7m 15s | Avg:  7m 15s | Max:  7m 15s
      🟩 Clang11            Pass: 100%/1   | Total:  6m 03s | Avg:  6m 03s | Max:  6m 03s
      🟩 Clang12            Pass: 100%/1   | Total:  6m 03s | Avg:  6m 03s | Max:  6m 03s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 Clang14            Pass: 100%/1   | Total:  6m 09s | Avg:  6m 09s | Max:  6m 09s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
      🟩 Clang16            Pass: 100%/1   | Total:  6m 18s | Avg:  6m 18s | Max:  6m 18s
      🟩 Clang17            Pass: 100%/1   | Total:  7m 18s | Avg:  7m 18s | Max:  7m 18s
      🟩 Clang18            Pass: 100%/7   | Total: 46m 45s | Avg:  6m 40s | Max: 10m 49s
      🟩 GCC6               Pass: 100%/2   | Total:  9m 05s | Avg:  4m 32s | Max:  4m 41s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  6m 01s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 GCC9               Pass: 100%/3   | Total: 15m 42s | Avg:  5m 14s | Max:  6m 29s
      🟩 GCC10              Pass: 100%/1   | Total:  6m 09s | Avg:  6m 09s | Max:  6m 09s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 30s | Avg:  6m 30s | Max:  6m 30s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 10s | Avg:  6m 10s | Max:  6m 10s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 05m | Avg:  8m 08s | Max: 13m 16s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  7m 35s | Avg:  7m 35s | Max:  7m 35s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 24m 44s | Avg: 24m 44s | Max: 24m 44s | Hits:  98%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 16m 47s | Avg: 16m 47s | Max: 16m 47s | Hits:  98%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 57m 33s | Avg: 19m 11s | Max: 22m 47s | Hits:  99%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 15s | Avg: 14m 07s | Max: 14m 09s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 01m | Avg:  6m 22s | Max: 10m 49s
      🟩 GCC                Pass: 100%/19  | Total:  2h 06m | Avg:  6m 38s | Max: 13m 16s
      🟩 Intel              Pass: 100%/1   | Total:  7m 35s | Avg:  7m 35s | Max:  7m 35s
      🟩 MSVC               Pass: 100%/5   | Total:  1h 39m | Avg: 19m 48s | Max: 24m 44s | Hits:  99%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 15s | Avg: 14m 07s | Max: 14m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  6h 22m | Avg:  8m 18s | Max: 24m 44s | Hits:  99%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  5h 06m | Avg:  7m 40s | Max: 24m 44s | Hits:  98%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 38m 29s | Avg: 12m 49s | Max: 22m 47s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 36m 43s | Avg: 12m 14s | Max: 13m 16s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  5m 40s | Avg:  5m 40s | Max:  5m 40s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 24m 47s | Avg:  4m 57s | Max:  6m 47s
      🟩 14                 Pass: 100%/4   | Total: 42m 45s | Avg: 10m 41s | Max: 24m 44s | Hits:  98%/1852  
      🟩 17                 Pass: 100%/12  | Total:  1h 43m | Avg:  8m 38s | Max: 16m 50s | Hits:  98%/3704  
      🟩 20                 Pass: 100%/23  | Total:  3h 10m | Avg:  8m 17s | Max: 22m 47s | Hits:  99%/3704  
    
  • 🟩 cub: Pass: 100%/45 | Total: 6h 47m | Avg: 9m 03s | Max: 30m 00s | Hits: 96%/3064

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 36m | Avg:  9m 13s | Max: 30m 00s | Hits:  96%/3064  
      🟩 arm64              Pass: 100%/2   | Total: 10m 45s | Avg:  5m 22s | Max:  5m 37s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total: 45m 36s | Avg:  6m 30s | Max: 17m 04s | Hits:  96%/766   
      🟩 12.5               Pass: 100%/2   | Total: 20m 33s | Avg: 10m 16s | Max: 10m 39s
      🟩 12.6               Pass: 100%/36  | Total:  5h 41m | Avg:  9m 28s | Max: 30m 00s | Hits:  96%/2298  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 33s
      🟩 nvcc11.1           Pass: 100%/7   | Total: 45m 36s | Avg:  6m 30s | Max: 17m 04s | Hits:  96%/766   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 33s | Avg: 10m 16s | Max: 10m 39s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 32m | Avg:  9m 46s | Max: 30m 00s | Hits:  96%/2298  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 33s
      🟩 nvcc               Pass: 100%/43  | Total:  6h 38m | Avg:  9m 15s | Max: 30m 00s | Hits:  96%/3064  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 25m 28s | Avg:  6m 22s | Max:  7m 39s
      🟩 Clang10            Pass: 100%/1   | Total:  7m 22s | Avg:  7m 22s | Max:  7m 22s
      🟩 Clang11            Pass: 100%/1   | Total:  7m 42s | Avg:  7m 42s | Max:  7m 42s
      🟩 Clang12            Pass: 100%/1   | Total:  7m 20s | Avg:  7m 20s | Max:  7m 20s
      🟩 Clang13            Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s
      🟩 Clang14            Pass: 100%/1   | Total:  6m 07s | Avg:  6m 07s | Max:  6m 07s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 55s | Avg:  5m 55s | Max:  5m 55s
      🟩 Clang17            Pass: 100%/1   | Total:  6m 21s | Avg:  6m 21s | Max:  6m 21s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 07m | Avg:  9m 39s | Max: 23m 15s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  4m 42s
      🟩 GCC7               Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  7m 15s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 40s | Avg:  5m 40s | Max:  5m 40s
      🟩 GCC9               Pass: 100%/3   | Total: 15m 10s | Avg:  5m 03s | Max:  5m 53s
      🟩 GCC10              Pass: 100%/1   | Total:  6m 15s | Avg:  6m 15s | Max:  6m 15s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 16s | Avg:  6m 16s | Max:  6m 16s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 12s | Avg:  6m 12s | Max:  6m 12s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 47m | Avg: 13m 23s | Max: 30m 00s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  7m 36s | Avg:  7m 36s | Max:  7m 36s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 17m 04s | Avg: 17m 04s | Max: 17m 04s | Hits:  96%/766   
      🟩 MSVC14.29          Pass: 100%/1   | Total: 15m 17s | Avg: 15m 17s | Max: 15m 17s | Hits:  96%/766   
      🟩 MSVC14.39          Pass: 100%/2   | Total: 32m 33s | Avg: 16m 16s | Max: 16m 28s | Hits:  96%/1532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 33s | Avg: 10m 16s | Max: 10m 39s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 26m | Avg:  7m 41s | Max: 23m 15s
      🟩 GCC                Pass: 100%/19  | Total:  2h 48m | Avg:  8m 50s | Max: 30m 00s
      🟩 Intel              Pass: 100%/1   | Total:  7m 36s | Avg:  7m 36s | Max:  7m 36s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 04m | Avg: 16m 13s | Max: 17m 04s | Hits:  96%/3064  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 33s | Avg: 10m 16s | Max: 10m 39s
    🟩 gpu
      🟩 v100               Pass: 100%/45  | Total:  6h 47m | Avg:  9m 03s | Max: 30m 00s | Hits:  96%/3064  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  4h 42m | Avg:  7m 15s | Max: 17m 04s | Hits:  96%/3064  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 17m 42s | Avg: 17m 42s | Max: 17m 42s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 47s | Avg: 16m 47s | Max: 16m 47s
      🟩 HostLaunch         Pass: 100%/2   | Total: 36m 45s | Avg: 18m 22s | Max: 19m 08s
      🟩 TestGPU            Pass: 100%/2   | Total: 53m 15s | Avg: 26m 37s | Max: 30m 00s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 42s | Avg:  4m 42s | Max:  4m 42s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 26m 31s | Avg:  5m 18s | Max:  7m 27s
      🟩 14                 Pass: 100%/4   | Total: 36m 40s | Avg:  9m 10s | Max: 17m 04s | Hits:  96%/766   
      🟩 17                 Pass: 100%/12  | Total:  1h 34m | Avg:  7m 54s | Max: 16m 05s | Hits:  96%/1532  
      🟩 20                 Pass: 100%/24  | Total:  4h 09m | Avg: 10m 23s | Max: 30m 00s | Hits:  96%/766   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 13s | Avg: 5m 36s | Max: 9m 08s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  9m 08s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 05s | Avg:  2m 05s | Max:  2m 05s
      🟩 Test               Pass: 100%/1   | Total:  9m 08s | Avg:  9m 08s | Max:  9m 08s
    
  • 🟩 python: Pass: 100%/1 | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 27m 24s | Avg: 27m 24s | Max: 27m 24s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 94)

# Runner
70 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16

@bernhardmgruber bernhardmgruber merged commit 6e8bfc7 into NVIDIA:main Dec 10, 2024
108 checks passed
@bernhardmgruber bernhardmgruber deleted the tuning_headers branch December 10, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants