Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract merge sort kernels to NVRTC compilable header #3438

Merged

Conversation

NaderAlAwar
Copy link
Contributor

Description

Closes #3386

Similar to #2231 and #3334, this PR extracts DeviceMergeSortBlockSortKernel, DeviceMergeSortPartitionKernel, and DeviceMergeSortMergeKernel into an NVRTC compilable header.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@NaderAlAwar NaderAlAwar requested review from a team as code owners January 17, 2025 15:06
Copy link
Contributor

🟨 CI finished in 1h 54m: Pass: 96%/78 | Total: 2d 02h | Avg: 39m 03s | Max: 1h 13m | Hits: 189%/10972
  • 🟨 thrust: Pass: 91%/37 | Total: 22h 55m | Avg: 37m 09s | Max: 1h 13m | Hits: 121%/7408

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  91%/35  | Total: 21h 49m | Avg: 37m 24s | Max:  1h 13m | Hits: 121%/7408  
      🟩 arm64              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 50s | Max: 34m 24s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  3h 30m | Avg: 42m 02s | Max:  1h 07m | Hits: 108%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
      🔍 12.6               Pass:  90%/30  | Total: 17h 09m | Avg: 34m 18s | Max:  1h 13m | Hits: 125%/5556  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 06m | Avg: 33m 04s | Max: 34m 04s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 30m | Avg: 42m 02s | Max:  1h 07m | Hits: 108%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
      🔍 nvcc12.6           Pass:  89%/28  | Total: 16h 03m | Avg: 34m 24s | Max:  1h 13m | Hits: 125%/5556  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 06m | Avg: 33m 04s | Max: 34m 04s
      🔍 nvcc               Pass:  91%/35  | Total: 21h 48m | Avg: 37m 23s | Max:  1h 13m | Hits: 121%/7408  
    🚨 jobs: TestCPU 🚨
      🟩 Build              Pass: 100%/31  | Total: 21h 23m | Avg: 41m 23s | Max:  1h 13m | Hits: 121%/7408  
      🔥 TestCPU            Pass:   0%/3   | Total: 52m 58s | Avg: 17m 39s | Max: 36m 53s
      🟩 TestGPU            Pass: 100%/3   | Total: 39m 02s | Avg: 13m 00s | Max: 14m 45s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/14  | Total: 10h 35m | Avg: 45m 25s | Max:  1h 11m | Hits: 125%/5556  
      🔍 20                 Pass:  85%/21  | Total: 11h 38m | Avg: 33m 15s | Max:  1h 13m | Hits: 108%/1852  
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 20m | Avg: 35m 02s | Max: 37m 30s
      🟩 Clang15            Pass: 100%/1   | Total: 35m 52s | Avg: 35m 52s | Max: 35m 52s
      🟩 Clang16            Pass: 100%/1   | Total: 37m 42s | Avg: 37m 42s | Max: 37m 42s
      🟩 Clang17            Pass: 100%/1   | Total: 35m 29s | Avg: 35m 29s | Max: 35m 29s
      🟨 Clang18            Pass:  85%/7   | Total:  3h 12m | Avg: 27m 29s | Max: 36m 19s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 12m | Avg: 36m 27s | Max: 36m 46s
      🟩 GCC8               Pass: 100%/1   | Total: 36m 19s | Avg: 36m 19s | Max: 36m 19s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 12m | Avg: 36m 17s | Max: 36m 21s
      🟩 GCC10              Pass: 100%/1   | Total: 37m 57s | Avg: 37m 57s | Max: 37m 57s
      🟩 GCC11              Pass: 100%/1   | Total: 35m 27s | Avg: 35m 27s | Max: 35m 27s
      🟩 GCC12              Pass: 100%/1   | Total: 37m 40s | Avg: 37m 40s | Max: 37m 40s
      🟨 GCC13              Pass:  87%/8   | Total:  3h 08m | Avg: 23m 34s | Max: 37m 41s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 07m | Hits: 117%/3704  
      🟨 MSVC14.39          Pass:  66%/3   | Total:  3h 01m | Avg:  1h 00m | Max:  1h 13m | Hits: 125%/3704  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
    🟨 cxx_family
      🟨 Clang              Pass:  92%/14  | Total:  7h 21m | Avg: 31m 32s | Max: 37m 42s
      🟨 GCC                Pass:  93%/16  | Total:  8h 01m | Avg: 30m 05s | Max: 37m 57s
      🟨 MSVC               Pass:  80%/5   | Total:  5h 16m | Avg:  1h 03m | Max:  1h 13m | Hits: 121%/7408  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
    🟨 gpu
      🟨 v100               Pass:  91%/37  | Total: 22h 55m | Avg: 37m 09s | Max:  1h 13m | Hits: 121%/7408  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 40m 44s | Avg: 20m 22s | Max: 28m 41s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 24m 00s | Avg: 24m 00s | Max: 24m 00s
    
  • 🟩 cub: Pass: 100%/38 | Total: 1d 03h | Avg: 42m 58s | Max: 1h 10m | Hits: 330%/3564

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 01h | Avg: 42m 41s | Max:  1h 10m | Hits: 330%/3564  
      🟩 arm64              Pass: 100%/2   | Total:  1h 36m | Avg: 48m 19s | Max: 49m 07s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 02m | Avg: 48m 33s | Max:  1h 03m | Hits: 329%/891   
      🟩 12.5               Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
      🟩 12.6               Pass: 100%/31  | Total: 21h 00m | Avg: 40m 38s | Max:  1h 10m | Hits: 330%/2673  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 55m | Avg: 57m 30s | Max: 59m 55s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 02m | Avg: 48m 33s | Max:  1h 03m | Hits: 329%/891   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 19h 05m | Avg: 39m 29s | Max:  1h 10m | Hits: 330%/2673  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 30s | Max: 59m 55s
      🟩 nvcc               Pass: 100%/36  | Total:  1d 01h | Avg: 42m 10s | Max:  1h 10m | Hits: 330%/3564  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 55m | Avg: 43m 57s | Max: 45m 57s
      🟩 Clang15            Pass: 100%/1   | Total: 47m 15s | Avg: 47m 15s | Max: 47m 15s
      🟩 Clang16            Pass: 100%/1   | Total: 42m 45s | Avg: 42m 45s | Max: 42m 45s
      🟩 Clang17            Pass: 100%/1   | Total: 42m 10s | Avg: 42m 10s | Max: 42m 10s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 54m | Avg: 42m 00s | Max: 59m 55s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 30m | Avg: 45m 14s | Max: 47m 01s
      🟩 GCC8               Pass: 100%/1   | Total: 45m 38s | Avg: 45m 38s | Max: 45m 38s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 28m | Avg: 44m 03s | Max: 44m 57s
      🟩 GCC10              Pass: 100%/1   | Total: 46m 40s | Avg: 46m 40s | Max: 46m 40s
      🟩 GCC11              Pass: 100%/1   | Total: 42m 50s | Avg: 42m 50s | Max: 42m 50s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 28m | Avg: 29m 20s | Max: 47m 57s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 54m | Avg: 29m 21s | Max: 47m 32s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 06m | Hits: 331%/1782  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 10m | Hits: 329%/1782  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 10h 02m | Avg: 43m 00s | Max: 59m 55s
      🟩 GCC                Pass: 100%/18  | Total: 10h 36m | Avg: 35m 22s | Max: 47m 57s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 24m | Avg:  1h 06m | Max:  1h 10m | Hits: 330%/3564  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 40m 05s | Avg: 20m 02s | Max: 20m 32s
      🟩 v100               Pass: 100%/36  | Total:  1d 02h | Avg: 44m 15s | Max:  1h 10m | Hits: 330%/3564  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 00h | Avg: 48m 13s | Max:  1h 10m | Hits: 330%/3564  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 17m 35s | Avg: 17m 35s | Max: 17m 35s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 36s | Avg: 14m 36s | Max: 14m 36s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 00m | Avg: 20m 15s | Max: 20m 53s
      🟩 TestGPU            Pass: 100%/2   | Total: 45m 35s | Avg: 22m 47s | Max: 23m 27s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 40m 05s | Avg: 20m 02s | Max: 20m 32s
      🟩 90a                Pass: 100%/1   | Total: 20m 42s | Avg: 20m 42s | Max: 20m 42s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 11h 58m | Avg: 51m 21s | Max:  1h 06m | Hits: 331%/2673  
      🟩 20                 Pass: 100%/24  | Total: 15h 14m | Avg: 38m 05s | Max:  1h 10m | Hits: 328%/891   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 18s | Avg: 5m 09s | Max: 8m 08s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  8m 08s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
      🟩 Test               Pass: 100%/1   | Total:  8m 08s | Avg:  8m 08s | Max:  8m 08s
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 04s | Avg: 28m 04s | Max: 28m 04s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

🟩 CI finished in 2h 10m: Pass: 100%/78 | Total: 2d 08h | Avg: 43m 17s | Max: 1h 17m | Hits: 195%/12824
  • 🟩 cub: Pass: 100%/38 | Total: 1d 07h | Avg: 50m 25s | Max: 1h 09m | Hits: 238%/3564

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 06h | Avg: 50m 04s | Max:  1h 09m | Hits: 238%/3564  
      🟩 arm64              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 35s | Max: 56m 49s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 54m | Avg: 58m 49s | Max:  1h 02m | Hits: 238%/891   
      🟩 12.5               Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m
      🟩 12.6               Pass: 100%/31  | Total:  1d 00h | Avg: 48m 02s | Max:  1h 09m | Hits: 237%/2673  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 57m | Avg: 58m 35s | Max: 59m 39s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 54m | Avg: 58m 49s | Max:  1h 02m | Hits: 238%/891   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 22h 52m | Avg: 47m 19s | Max:  1h 09m | Hits: 237%/2673  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 35s | Max: 59m 39s
      🟩 nvcc               Pass: 100%/36  | Total:  1d 05h | Avg: 49m 57s | Max:  1h 09m | Hits: 238%/3564  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 47m | Avg: 56m 47s | Max: 59m 59s
      🟩 Clang15            Pass: 100%/1   | Total: 56m 42s | Avg: 56m 42s | Max: 56m 42s
      🟩 Clang16            Pass: 100%/1   | Total: 54m 33s | Avg: 54m 33s | Max: 54m 33s
      🟩 Clang17            Pass: 100%/1   | Total: 59m 39s | Avg: 59m 39s | Max: 59m 39s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 35m | Avg: 47m 54s | Max: 59m 39s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 52m | Avg: 56m 24s | Max: 58m 54s
      🟩 GCC8               Pass: 100%/1   | Total: 57m 35s | Avg: 57m 35s | Max: 57m 35s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 50m | Avg: 55m 24s | Max: 55m 59s
      🟩 GCC10              Pass: 100%/1   | Total: 54m 33s | Avg: 54m 33s | Max: 54m 33s
      🟩 GCC11              Pass: 100%/1   | Total: 56m 04s | Avg: 56m 04s | Max: 56m 04s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 22s | Max: 54m 20s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 51m | Avg: 36m 27s | Max:  1h 01m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits: 238%/1782  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m | Hits: 237%/1782  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 13m | Avg: 52m 23s | Max: 59m 59s
      🟩 GCC                Pass: 100%/18  | Total: 13h 03m | Avg: 43m 31s | Max:  1h 01m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 26m | Avg:  1h 06m | Max:  1h 09m | Hits: 238%/3564  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 45m 46s | Avg: 22m 53s | Max: 26m 22s
      🟩 v100               Pass: 100%/36  | Total:  1d 07h | Avg: 51m 56s | Max:  1h 09m | Hits: 238%/3564  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 05h | Avg: 56m 32s | Max:  1h 09m | Hits: 238%/3564  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 45s | Avg: 19m 45s | Max: 19m 45s
      🟩 GraphCapture       Pass: 100%/1   | Total: 21m 43s | Avg: 21m 43s | Max: 21m 43s
      🟩 HostLaunch         Pass: 100%/3   | Total: 59m 24s | Avg: 19m 48s | Max: 20m 22s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 06s | Max: 31m 13s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 45m 46s | Avg: 22m 53s | Max: 26m 22s
      🟩 90a                Pass: 100%/1   | Total: 25m 21s | Avg: 25m 21s | Max: 25m 21s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 13h 47m | Avg: 59m 06s | Max:  1h 08m | Hits: 238%/2673  
      🟩 20                 Pass: 100%/24  | Total: 18h 08m | Avg: 45m 20s | Max:  1h 09m | Hits: 235%/891   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 23h 24m | Avg: 37m 57s | Max: 1h 17m | Hits: 178%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 57m 01s | Avg: 28m 30s | Max: 33m 52s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 22h 17m | Avg: 38m 13s | Max:  1h 17m | Hits: 178%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 16s | Max: 34m 58s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 29m | Avg: 41m 54s | Max:  1h 07m | Hits: 142%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
      🟩 12.6               Pass: 100%/30  | Total: 17h 30m | Avg: 35m 00s | Max:  1h 17m | Hits: 187%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 01m | Avg: 30m 50s | Max: 30m 56s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 29m | Avg: 41m 54s | Max:  1h 07m | Hits: 142%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
      🟩 nvcc12.6           Pass: 100%/28  | Total: 16h 28m | Avg: 35m 18s | Max:  1h 17m | Hits: 187%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 50s | Max: 30m 56s
      🟩 nvcc               Pass: 100%/35  | Total: 22h 22m | Avg: 38m 21s | Max:  1h 17m | Hits: 178%/9260  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 21m | Avg: 35m 23s | Max: 35m 37s
      🟩 Clang15            Pass: 100%/1   | Total: 34m 37s | Avg: 34m 37s | Max: 34m 37s
      🟩 Clang16            Pass: 100%/1   | Total: 36m 32s | Avg: 36m 32s | Max: 36m 32s
      🟩 Clang17            Pass: 100%/1   | Total: 37m 30s | Avg: 37m 30s | Max: 37m 30s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 06m | Avg: 26m 36s | Max: 36m 26s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 24s | Max: 34m 25s
      🟩 GCC8               Pass: 100%/1   | Total: 37m 35s | Avg: 37m 35s | Max: 37m 35s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 14m | Avg: 37m 14s | Max: 37m 42s
      🟩 GCC10              Pass: 100%/1   | Total: 36m 24s | Avg: 36m 24s | Max: 36m 24s
      🟩 GCC11              Pass: 100%/1   | Total: 35m 41s | Avg: 35m 41s | Max: 35m 41s
      🟩 GCC12              Pass: 100%/1   | Total: 39m 23s | Avg: 39m 23s | Max: 39m 23s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 28m | Avg: 26m 02s | Max: 39m 48s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m | Hits: 135%/3704  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 07m | Avg:  1h 02m | Max:  1h 17m | Hits: 207%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  7h 16m | Avg: 31m 10s | Max: 37m 30s
      🟩 GCC                Pass: 100%/16  | Total:  8h 20m | Avg: 31m 17s | Max: 39m 48s
      🟩 MSVC               Pass: 100%/5   | Total:  5h 22m | Avg:  1h 04m | Max:  1h 17m | Hits: 178%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 23h 24m | Avg: 37m 57s | Max:  1h 17m | Hits: 178%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 21h 40m | Avg: 41m 57s | Max:  1h 17m | Hits: 132%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 21s | Avg: 17m 27s | Max: 36m 54s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 51m 24s | Avg: 17m 08s | Max: 23m 09s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 22m 15s | Avg: 22m 15s | Max: 22m 15s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 10h 35m | Avg: 45m 24s | Max:  1h 12m | Hits: 133%/5556  
      🟩 20                 Pass: 100%/21  | Total: 11h 51m | Avg: 33m 52s | Max:  1h 17m | Hits: 247%/3704  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 08s | Avg: 5m 34s | Max: 8m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  8m 53s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
      🟩 Test               Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s
    
  • 🟩 python: Pass: 100%/1 | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 45m 13s | Avg: 45m 13s | Max: 45m 13s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking great

I need to close my eyes on the things thrust is doing with the iterators, but that is preexisting and not part of this PR

Comment on lines 47 to 48

#include <cuda/std/__cccl/cuda_capabilities.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is part of our global config and should not be included separately

Suggested change
#include <cuda/std/__cccl/cuda_capabilities.h>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included this because I need it for _CCCL_PDL_GRID_DEPENDENCY_SYNC(). Which header file should I be including instead?

Copy link
Collaborator

@miscco miscco Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none, all of that is already in the cub config

Everything within cuda/std/__cccl is supposed to be globally available so every config includes it first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline but the issue is that this header has not been included into the cccl config, so I believe that this is fine for now

cub/cub/agent/agent_merge_sort.cuh Outdated Show resolved Hide resolved
cub/cub/util_policy_wrapper_t.cuh Outdated Show resolved Hide resolved
#include <iterator>
#include <type_traits>
#include <utility>
#include <cuda/std/iterator> // Needed for __gnu_cxx::__normal_iterator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change requested, but all of this needs to go an be replaced by ::cuda::std::contiguous_iterator

Copy link
Contributor

🟩 CI finished in 1h 51m: Pass: 100%/78 | Total: 2d 06h | Avg: 42m 03s | Max: 1h 13m | Hits: 241%/12784
  • 🟩 cub: Pass: 100%/38 | Total: 1d 08h | Avg: 51m 13s | Max: 1h 13m | Hits: 332%/3564

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 06h | Avg: 50m 52s | Max:  1h 13m | Hits: 332%/3564  
      🟩 arm64              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 34s | Max: 59m 11s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 52m | Avg: 58m 32s | Max:  1h 04m | Hits: 334%/891   
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
      🟩 12.6               Pass: 100%/31  | Total:  1d 01h | Avg: 49m 02s | Max:  1h 13m | Hits: 332%/2673  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 52m | Avg: 58m 32s | Max:  1h 04m | Hits: 334%/891   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 23h 19m | Avg: 48m 16s | Max:  1h 13m | Hits: 332%/2673  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
      🟩 nvcc               Pass: 100%/36  | Total:  1d 06h | Avg: 50m 44s | Max:  1h 13m | Hits: 332%/3564  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 54m | Avg: 58m 34s | Max:  1h 00m
      🟩 Clang15            Pass: 100%/1   | Total: 54m 52s | Avg: 54m 52s | Max: 54m 52s
      🟩 Clang16            Pass: 100%/1   | Total: 56m 26s | Avg: 56m 26s | Max: 56m 26s
      🟩 Clang17            Pass: 100%/1   | Total: 55m 50s | Avg: 55m 50s | Max: 55m 50s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 45m | Avg: 49m 19s | Max:  1h 02m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 52m | Avg: 56m 15s | Max: 59m 37s
      🟩 GCC8               Pass: 100%/1   | Total: 55m 09s | Avg: 55m 09s | Max: 55m 09s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 52m | Avg: 56m 24s | Max: 58m 09s
      🟩 GCC10              Pass: 100%/1   | Total: 59m 38s | Avg: 59m 38s | Max: 59m 38s
      🟩 GCC11              Pass: 100%/1   | Total: 59m 21s | Avg: 59m 21s | Max: 59m 21s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 46m | Avg: 35m 37s | Max:  1h 02m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 49m | Avg: 36m 09s | Max: 59m 45s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits: 334%/1782  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 13m | Hits: 330%/1782  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 26m | Avg: 53m 20s | Max:  1h 02m
      🟩 GCC                Pass: 100%/18  | Total: 13h 15m | Avg: 44m 11s | Max:  1h 02m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 30m | Avg:  1h 07m | Max:  1h 13m | Hits: 332%/3564  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 44m 00s | Avg: 22m 00s | Max: 25m 00s
      🟩 v100               Pass: 100%/36  | Total:  1d 07h | Avg: 52m 51s | Max:  1h 13m | Hits: 332%/3564  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 05h | Avg: 57m 40s | Max:  1h 13m | Hits: 332%/3564  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 56s | Avg: 20m 56s | Max: 20m 56s
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 52s | Avg: 20m 52s | Max: 20m 52s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 04m | Avg: 21m 21s | Max: 23m 41s
      🟩 TestGPU            Pass: 100%/2   | Total: 53m 00s | Avg: 26m 30s | Max: 30m 09s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 44m 00s | Avg: 22m 00s | Max: 25m 00s
      🟩 90a                Pass: 100%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 14h 02m | Avg:  1h 00m | Max:  1h 13m | Hits: 335%/2673  
      🟩 20                 Pass: 100%/24  | Total: 18h 24m | Avg: 46m 01s | Max:  1h 08m | Hits: 326%/891   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 21h 18m | Avg: 34m 32s | Max: 1h 07m | Hits: 205%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total:  1h 05m | Avg: 32m 31s | Max: 34m 41s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 20h 18m | Avg: 34m 48s | Max:  1h 07m | Hits: 205%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 00s | Max: 31m 21s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 06m | Avg: 37m 23s | Max: 55m 45s | Hits: 168%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
      🟩 12.6               Pass: 100%/30  | Total: 16h 04m | Avg: 32m 09s | Max:  1h 06m | Hits: 215%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 06s | Avg: 28m 03s | Max: 30m 38s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 06m | Avg: 37m 23s | Max: 55m 45s | Hits: 168%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
      🟩 nvcc12.6           Pass: 100%/28  | Total: 15h 08m | Avg: 32m 26s | Max:  1h 06m | Hits: 215%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 06s | Avg: 28m 03s | Max: 30m 38s
      🟩 nvcc               Pass: 100%/35  | Total: 20h 22m | Avg: 34m 55s | Max:  1h 07m | Hits: 205%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 14s | Max: 33m 18s
      🟩 Clang15            Pass: 100%/1   | Total: 31m 34s | Avg: 31m 34s | Max: 31m 34s
      🟩 Clang16            Pass: 100%/1   | Total: 32m 29s | Avg: 32m 29s | Max: 32m 29s
      🟩 Clang17            Pass: 100%/1   | Total: 34m 14s | Avg: 34m 14s | Max: 34m 14s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 50m | Avg: 24m 18s | Max: 31m 50s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 06m | Avg: 33m 05s | Max: 33m 17s
      🟩 GCC8               Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 52s | Max: 35m 05s
      🟩 GCC10              Pass: 100%/1   | Total: 33m 24s | Avg: 33m 24s | Max: 33m 24s
      🟩 GCC11              Pass: 100%/1   | Total: 35m 49s | Avg: 35m 49s | Max: 35m 49s
      🟩 GCC12              Pass: 100%/1   | Total: 33m 00s | Avg: 33m 00s | Max: 33m 00s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 24m | Avg: 25m 32s | Max: 35m 11s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 43s | Max: 57m 42s | Hits: 171%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 46m | Avg: 55m 25s | Max:  1h 06m | Hits: 228%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  6h 37m | Avg: 28m 23s | Max: 34m 14s
      🟩 GCC                Pass: 100%/16  | Total:  7h 54m | Avg: 29m 39s | Max: 35m 49s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 39m | Avg: 55m 56s | Max:  1h 06m | Hits: 205%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 21h 18m | Avg: 34m 32s | Max:  1h 07m | Hits: 205%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 19h 25m | Avg: 37m 35s | Max:  1h 07m | Hits: 165%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 50m 52s | Avg: 16m 57s | Max: 34m 59s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 01m | Avg: 20m 37s | Max: 34m 41s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 02s | Avg: 18m 02s | Max: 18m 02s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  9h 30m | Avg: 40m 44s | Max:  1h 07m | Hits: 167%/5532  
      🟩 20                 Pass: 100%/21  | Total: 10h 42m | Avg: 30m 36s | Max:  1h 06m | Hits: 262%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 43s | Avg: 4m 51s | Max: 7m 41s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  7m 41s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 02s | Avg:  2m 02s | Max:  2m 02s
      🟩 Test               Pass: 100%/1   | Total:  7m 41s | Avg:  7m 41s | Max:  7m 41s
    
  • 🟩 python: Pass: 100%/1 | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 45m 22s | Avg: 45m 22s | Max: 45m 22s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@NaderAlAwar NaderAlAwar requested a review from miscco January 21, 2025 22:06
Copy link
Contributor

🟩 CI finished in 2h 32m: Pass: 100%/78 | Total: 2d 10h | Avg: 44m 55s | Max: 1h 19m | Hits: 102%/12784
  • 🟩 cub: Pass: 100%/38 | Total: 1d 09h | Avg: 53m 08s | Max: 1h 19m | Hits: 34%/3564

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 07h | Avg: 52m 33s | Max:  1h 19m | Hits:  34%/3564  
      🟩 arm64              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 56m | Avg: 59m 19s | Max:  1h 03m | Hits:  34%/891   
      🟩 12.5               Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 15m
      🟩 12.6               Pass: 100%/31  | Total:  1d 02h | Avg: 50m 51s | Max:  1h 19m | Hits:  34%/2673  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 56m | Avg: 59m 19s | Max:  1h 03m | Hits:  34%/891   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 15m
      🟩 nvcc12.6           Pass: 100%/29  | Total:  1d 00h | Avg: 50m 03s | Max:  1h 19m | Hits:  34%/2673  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m
      🟩 nvcc               Pass: 100%/36  | Total:  1d 07h | Avg: 52m 37s | Max:  1h 19m | Hits:  34%/3564  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 47m | Avg: 56m 47s | Max:  1h 01m
      🟩 Clang15            Pass: 100%/1   | Total: 56m 51s | Avg: 56m 51s | Max: 56m 51s
      🟩 Clang16            Pass: 100%/1   | Total: 53m 19s | Avg: 53m 19s | Max: 53m 19s
      🟩 Clang17            Pass: 100%/1   | Total: 54m 17s | Avg: 54m 17s | Max: 54m 17s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 42m | Avg: 48m 58s | Max:  1h 04m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 55m | Avg: 57m 39s | Max: 58m 59s
      🟩 GCC8               Pass: 100%/1   | Total: 52m 51s | Avg: 52m 51s | Max: 52m 51s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 52s | Max:  1h 00m
      🟩 GCC10              Pass: 100%/1   | Total: 54m 08s | Avg: 54m 08s | Max: 54m 08s
      🟩 GCC11              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 GCC12              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 26s | Max: 55m 41s
      🟩 GCC13              Pass: 100%/8   | Total:  5h 52m | Avg: 44m 02s | Max:  1h 05m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 06m | Hits:  34%/1782  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 35m | Avg:  1h 17m | Max:  1h 19m | Hits:  34%/1782  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 15m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 14m | Avg: 52m 27s | Max:  1h 04m
      🟩 GCC                Pass: 100%/18  | Total: 14h 13m | Avg: 47m 24s | Max:  1h 05m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 45m | Avg:  1h 11m | Max:  1h 19m | Hits:  34%/3564  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 15m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 44m 38s | Avg: 22m 19s | Max: 25m 22s
      🟩 v100               Pass: 100%/36  | Total:  1d 08h | Avg: 54m 51s | Max:  1h 19m | Hits:  34%/3564  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 06h | Avg: 58m 17s | Max:  1h 19m | Hits:  34%/3564  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 40m 17s | Avg: 40m 17s | Max: 40m 17s
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 02s | Avg: 19m 02s | Max: 19m 02s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 25m | Avg: 28m 31s | Max: 46m 48s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 45s | Max: 44m 06s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 44m 38s | Avg: 22m 19s | Max: 25m 22s
      🟩 90a                Pass: 100%/1   | Total: 24m 35s | Avg: 24m 35s | Max: 24m 35s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 14h 07m | Avg:  1h 00m | Max:  1h 15m | Hits:  34%/2673  
      🟩 20                 Pass: 100%/24  | Total: 19h 31m | Avg: 48m 49s | Max:  1h 19m | Hits:  34%/891   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 23h 31m | Avg: 38m 09s | Max: 1h 17m | Hits: 129%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 42m 31s | Avg: 21m 15s | Max: 31m 28s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 22h 24m | Avg: 38m 25s | Max:  1h 17m | Hits: 129%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 40s | Max: 35m 06s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 33m | Avg: 42m 45s | Max:  1h 08m | Hits:  74%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m
      🟩 12.6               Pass: 100%/30  | Total: 17h 32m | Avg: 35m 05s | Max:  1h 17m | Hits: 143%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 05m | Avg: 32m 48s | Max: 34m 45s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 33m | Avg: 42m 45s | Max:  1h 08m | Hits:  74%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m
      🟩 nvcc12.6           Pass: 100%/28  | Total: 16h 27m | Avg: 35m 15s | Max:  1h 17m | Hits: 143%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 05m | Avg: 32m 48s | Max: 34m 45s
      🟩 nvcc               Pass: 100%/35  | Total: 22h 26m | Avg: 38m 28s | Max:  1h 17m | Hits: 129%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 23m | Avg: 35m 54s | Max: 37m 13s
      🟩 Clang15            Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
      🟩 Clang16            Pass: 100%/1   | Total: 34m 03s | Avg: 34m 03s | Max: 34m 03s
      🟩 Clang17            Pass: 100%/1   | Total: 38m 53s | Avg: 38m 53s | Max: 38m 53s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 05m | Avg: 26m 26s | Max: 35m 11s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 11m | Avg: 35m 42s | Max: 35m 58s
      🟩 GCC8               Pass: 100%/1   | Total: 38m 13s | Avg: 38m 13s | Max: 38m 13s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 17m | Avg: 38m 40s | Max: 38m 46s
      🟩 GCC10              Pass: 100%/1   | Total: 36m 58s | Avg: 36m 58s | Max: 36m 58s
      🟩 GCC11              Pass: 100%/1   | Total: 36m 38s | Avg: 36m 38s | Max: 36m 38s
      🟩 GCC12              Pass: 100%/1   | Total: 42m 36s | Avg: 42m 36s | Max: 42m 36s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 21m | Avg: 25m 10s | Max: 39m 42s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 08m | Hits:  76%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 05m | Avg:  1h 01m | Max:  1h 17m | Hits: 164%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  7h 20m | Avg: 31m 27s | Max: 38m 53s
      🟩 GCC                Pass: 100%/16  | Total:  8h 24m | Avg: 31m 32s | Max: 42m 36s
      🟩 MSVC               Pass: 100%/5   | Total:  5h 21m | Avg:  1h 04m | Max:  1h 17m | Hits: 129%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 14m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 23h 31m | Avg: 38m 09s | Max:  1h 17m | Hits: 129%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 22h 05m | Avg: 42m 45s | Max:  1h 17m | Hits:  70%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 06s | Avg: 16m 22s | Max: 33m 11s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 37m 26s | Avg: 12m 28s | Max: 14m 57s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 28m 53s | Avg: 28m 53s | Max: 28m 53s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 10h 48m | Avg: 46m 17s | Max:  1h 14m | Hits:  72%/5532  
      🟩 20                 Pass: 100%/21  | Total: 12h 01m | Avg: 34m 21s | Max:  1h 17m | Hits: 214%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 23s | Avg: 4m 41s | Max: 7m 11s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 11s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s
      🟩 Test               Pass: 100%/1   | Total:  7m 11s | Avg:  7m 11s | Max:  7m 11s
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 03m | Avg: 1h 03m | Max: 1h 03m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

🟩 CI finished in 2h 01m: Pass: 100%/78 | Total: 2d 08h | Avg: 43m 34s | Max: 1h 11m | Hits: 194%/12784
  • 🟩 cub: Pass: 100%/38 | Total: 1d 09h | Avg: 52m 09s | Max: 1h 07m | Hits: 252%/3564

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 07h | Avg: 51m 51s | Max:  1h 07m | Hits: 252%/3564  
      🟩 arm64              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 33s | Max: 58m 40s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 01m | Hits: 252%/891   
      🟩 12.5               Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m
      🟩 12.6               Pass: 100%/31  | Total:  1d 01h | Avg: 50m 05s | Max:  1h 07m | Hits: 252%/2673  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 01m | Hits: 252%/891   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 23h 48m | Avg: 49m 14s | Max:  1h 07m | Hits: 252%/2673  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m
      🟩 nvcc               Pass: 100%/36  | Total:  1d 06h | Avg: 51m 35s | Max:  1h 07m | Hits: 252%/3564  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 01m | Avg:  1h 00m | Max:  1h 01m
      🟩 Clang15            Pass: 100%/1   | Total: 54m 46s | Avg: 54m 46s | Max: 54m 46s
      🟩 Clang16            Pass: 100%/1   | Total: 57m 01s | Avg: 57m 01s | Max: 57m 01s
      🟩 Clang17            Pass: 100%/1   | Total: 59m 34s | Avg: 59m 34s | Max: 59m 34s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 44m | Avg: 49m 16s | Max:  1h 02m
      🟩 GCC7               Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
      🟩 GCC8               Pass: 100%/1   | Total: 55m 06s | Avg: 55m 06s | Max: 55m 06s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 49s | Max:  1h 00m
      🟩 GCC10              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 GCC11              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 GCC12              Pass: 100%/3   | Total:  1h 47m | Avg: 35m 45s | Max: 59m 28s
      🟩 GCC13              Pass: 100%/8   | Total:  5h 15m | Avg: 39m 22s | Max:  1h 00m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m | Hits: 252%/1782  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m | Hits: 252%/1782  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 37m | Avg: 54m 05s | Max:  1h 02m
      🟩 GCC                Pass: 100%/18  | Total: 13h 57m | Avg: 46m 32s | Max:  1h 02m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 19m | Avg:  1h 04m | Max:  1h 07m | Hits: 252%/3564  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 47m 48s | Avg: 23m 54s | Max: 27m 44s
      🟩 v100               Pass: 100%/36  | Total:  1d 08h | Avg: 53m 43s | Max:  1h 07m | Hits: 252%/3564  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 05h | Avg: 57m 52s | Max:  1h 07m | Hits: 252%/3564  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 33m 53s | Avg: 33m 53s | Max: 33m 53s
      🟩 GraphCapture       Pass: 100%/1   | Total: 22m 11s | Avg: 22m 11s | Max: 22m 11s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 03m | Avg: 21m 17s | Max: 23m 58s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 47s | Max: 39m 52s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 47m 48s | Avg: 23m 54s | Max: 27m 44s
      🟩 90a                Pass: 100%/1   | Total: 25m 48s | Avg: 25m 48s | Max: 25m 48s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 14h 06m | Avg:  1h 00m | Max:  1h 05m | Hits: 252%/2673  
      🟩 20                 Pass: 100%/24  | Total: 18h 55m | Avg: 47m 17s | Max:  1h 07m | Hits: 252%/891   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 22h 38m | Avg: 36m 42s | Max: 1h 11m | Hits: 171%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 42m 14s | Avg: 21m 07s | Max: 30m 42s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 21h 28m | Avg: 36m 48s | Max:  1h 11m | Hits: 171%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  1h 09m | Avg: 34m 49s | Max: 37m 04s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 23m | Avg: 40m 47s | Max:  1h 02m | Hits: 119%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m
      🟩 12.6               Pass: 100%/30  | Total: 17h 00m | Avg: 34m 01s | Max:  1h 11m | Hits: 184%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 09m | Avg: 34m 43s | Max: 37m 41s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 23m | Avg: 40m 47s | Max:  1h 02m | Hits: 119%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m
      🟩 nvcc12.6           Pass: 100%/28  | Total: 15h 51m | Avg: 33m 58s | Max:  1h 11m | Hits: 184%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 43s | Max: 37m 41s
      🟩 nvcc               Pass: 100%/35  | Total: 21h 28m | Avg: 36m 49s | Max:  1h 11m | Hits: 171%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 23m | Avg: 35m 51s | Max: 38m 25s
      🟩 Clang15            Pass: 100%/1   | Total: 33m 40s | Avg: 33m 40s | Max: 33m 40s
      🟩 Clang16            Pass: 100%/1   | Total: 33m 51s | Avg: 33m 51s | Max: 33m 51s
      🟩 Clang17            Pass: 100%/1   | Total: 34m 34s | Avg: 34m 34s | Max: 34m 34s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 15m | Avg: 27m 55s | Max: 37m 41s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 46s | Max: 33m 49s
      🟩 GCC8               Pass: 100%/1   | Total: 33m 54s | Avg: 33m 54s | Max: 33m 54s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 15m | Avg: 37m 47s | Max: 37m 57s
      🟩 GCC10              Pass: 100%/1   | Total: 38m 24s | Avg: 38m 24s | Max: 38m 24s
      🟩 GCC11              Pass: 100%/1   | Total: 37m 41s | Avg: 37m 41s | Max: 37m 41s
      🟩 GCC12              Pass: 100%/1   | Total: 39m 10s | Avg: 39m 10s | Max: 39m 10s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 10m | Avg: 23m 50s | Max: 37m 04s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m | Hits: 129%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 56m | Avg: 58m 47s | Max:  1h 11m | Hits: 199%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  7h 20m | Avg: 31m 29s | Max: 38m 25s
      🟩 GCC                Pass: 100%/16  | Total:  8h 03m | Avg: 30m 11s | Max: 39m 10s
      🟩 MSVC               Pass: 100%/5   | Total:  5h 00m | Avg:  1h 00m | Max:  1h 11m | Hits: 171%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 22h 38m | Avg: 36m 42s | Max:  1h 11m | Hits: 171%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 21h 09m | Avg: 40m 56s | Max:  1h 11m | Hits: 123%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 50m 38s | Avg: 16m 52s | Max: 34m 49s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 38m 10s | Avg: 12m 43s | Max: 14m 27s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 21m 41s | Avg: 21m 41s | Max: 21m 41s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 10h 24m | Avg: 44m 34s | Max:  1h 11m | Hits: 125%/5532  
      🟩 20                 Pass: 100%/21  | Total: 11h 31m | Avg: 32m 56s | Max:  1h 10m | Hits: 241%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 59s | Avg: 4m 29s | Max: 6m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  6m 40s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 19s | Avg:  2m 19s | Max:  2m 19s
      🟩 Test               Pass: 100%/1   | Total:  6m 40s | Avg:  6m 40s | Max:  6m 40s
    
  • 🟩 python: Pass: 100%/1 | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 49m 39s | Avg: 49m 39s | Max: 49m 39s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of thrust functionality being moved into individual headers. Is there a technical reason for this? Is it because of NVRTC?

Comment on lines +44 to +46
template <typename PolicyT, int BLOCK_THREADS_, int ITEMS_PER_THREAD_ = PolicyT::ITEMS_PER_THREAD>
struct policy_wrapper_t : PolicyT
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why does this struct need to be moved into a separate file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is due to NVRTC. Specifically, merge_sort.cuh needs policy_wrapper_t but I can't include all of util_device.cuh due to NVRTC errors which I could not resolve easily, so my strategy with this and similar situations was to move things to separate headers. For the most part, I tried to fix the NVRTC errors but when that failed I resorted to this approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit with NVRTC, and getting util_device.cuh to compile is surprisingly easy (you just #ifdef out everything from the header except policy_wrapper_t). So I guess pulling out that part makes a lot of sense!

I also tried getting the merge sort headers to compile, but I got stuck at thrust iterators, as always. This is what I meant today in the team meeting that we need an overhaul here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I wasn't aware I could do that. Would you like me to implement this instead of what I have now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just pull out policy_wrapper_t, it's fine!

@NaderAlAwar
Copy link
Contributor Author

I see a lot of thrust functionality being moved into individual headers. Is there a technical reason for this? Is it because of NVRTC?

Yeah see #3438 (comment)

Copy link
Contributor

🟩 CI finished in 2h 38m: Pass: 100%/78 | Total: 2d 09h | Avg: 43m 52s | Max: 1h 13m | Hits: 172%/12772
  • 🟩 cub: Pass: 100%/38 | Total: 1d 08h | Avg: 51m 35s | Max: 1h 13m | Hits: 233%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 06h | Avg: 51m 07s | Max:  1h 13m | Hits: 233%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 02m | Avg:  1h 00m | Max:  1h 03m | Hits: 233%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 03m
      🟩 12.6               Pass: 100%/31  | Total:  1d 01h | Avg: 49m 22s | Max:  1h 13m | Hits: 233%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 02m | Avg:  1h 00m | Max:  1h 03m | Hits: 233%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 03m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 23h 26m | Avg: 48m 30s | Max:  1h 13m | Hits: 233%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m
      🟩 nvcc               Pass: 100%/36  | Total:  1d 06h | Avg: 51m 00s | Max:  1h 13m | Hits: 233%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 00m | Avg:  1h 00m | Max:  1h 01m
      🟩 Clang15            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 Clang16            Pass: 100%/1   | Total: 57m 06s | Avg: 57m 06s | Max: 57m 06s
      🟩 Clang17            Pass: 100%/1   | Total: 56m 56s | Avg: 56m 56s | Max: 56m 56s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 50m | Avg: 50m 00s | Max:  1h 03m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 50s | Max:  1h 01m
      🟩 GCC8               Pass: 100%/1   | Total: 53m 16s | Avg: 53m 16s | Max: 53m 16s
      🟩 GCC9               Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m
      🟩 GCC10              Pass: 100%/1   | Total: 56m 22s | Avg: 56m 22s | Max: 56m 22s
      🟩 GCC11              Pass: 100%/1   | Total: 54m 27s | Avg: 54m 27s | Max: 54m 27s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 42m | Avg: 34m 09s | Max: 57m 16s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 56m | Avg: 37m 01s | Max:  1h 02m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 03m | Hits: 233%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 13m | Hits: 234%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 45m | Avg: 54m 41s | Max:  1h 03m
      🟩 GCC                Pass: 100%/18  | Total: 13h 20m | Avg: 44m 29s | Max:  1h 02m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 26m | Avg:  1h 06m | Max:  1h 13m | Hits: 233%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 03m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 45m 12s | Avg: 22m 36s | Max: 26m 12s
      🟩 v100               Pass: 100%/36  | Total:  1d 07h | Avg: 53m 11s | Max:  1h 13m | Hits: 233%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 06h | Avg: 58m 24s | Max:  1h 13m | Hits: 233%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 24s | Avg: 22m 24s | Max: 22m 24s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s
      🟩 HostLaunch         Pass: 100%/3   | Total: 59m 11s | Avg: 19m 43s | Max: 20m 45s
      🟩 TestGPU            Pass: 100%/2   | Total: 52m 45s | Avg: 26m 22s | Max: 28m 53s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 45m 12s | Avg: 22m 36s | Max: 26m 12s
      🟩 90a                Pass: 100%/1   | Total: 27m 25s | Avg: 27m 25s | Max: 27m 25s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 14h 09m | Avg:  1h 00m | Max:  1h 05m | Hits: 234%/2664  
      🟩 20                 Pass: 100%/24  | Total: 18h 30m | Avg: 46m 16s | Max:  1h 13m | Hits: 230%/888   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 23h 24m | Avg: 37m 58s | Max: 1h 13m | Hits: 149%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 43m 07s | Avg: 21m 33s | Max: 30m 42s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 22h 16m | Avg: 38m 11s | Max:  1h 13m | Hits: 149%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 01s | Max: 35m 04s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 33m | Avg: 42m 47s | Max:  1h 09m | Hits:  81%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
      🟩 12.6               Pass: 100%/30  | Total: 17h 26m | Avg: 34m 52s | Max:  1h 13m | Hits: 166%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 07m | Avg: 33m 43s | Max: 33m 46s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 33m | Avg: 42m 47s | Max:  1h 09m | Hits:  81%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
      🟩 nvcc12.6           Pass: 100%/28  | Total: 16h 18m | Avg: 34m 57s | Max:  1h 13m | Hits: 166%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 07m | Avg: 33m 43s | Max: 33m 46s
      🟩 nvcc               Pass: 100%/35  | Total: 22h 17m | Avg: 38m 12s | Max:  1h 13m | Hits: 149%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 23m | Avg: 35m 48s | Max: 37m 07s
      🟩 Clang15            Pass: 100%/1   | Total: 34m 54s | Avg: 34m 54s | Max: 34m 54s
      🟩 Clang16            Pass: 100%/1   | Total: 35m 06s | Avg: 35m 06s | Max: 35m 06s
      🟩 Clang17            Pass: 100%/1   | Total: 36m 36s | Avg: 36m 36s | Max: 36m 36s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 14m | Avg: 27m 43s | Max: 37m 06s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 48s | Max: 35m 26s
      🟩 GCC8               Pass: 100%/1   | Total: 36m 09s | Avg: 36m 09s | Max: 36m 09s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 16m | Avg: 38m 01s | Max: 38m 02s
      🟩 GCC10              Pass: 100%/1   | Total: 39m 25s | Avg: 39m 25s | Max: 39m 25s
      🟩 GCC11              Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
      🟩 GCC12              Pass: 100%/1   | Total: 36m 38s | Avg: 36m 38s | Max: 36m 38s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 23m | Avg: 25m 28s | Max: 39m 42s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m | Hits: 109%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 56m | Avg: 58m 43s | Max:  1h 13m | Hits: 175%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  7h 23m | Avg: 31m 42s | Max: 37m 07s
      🟩 GCC                Pass: 100%/16  | Total:  8h 21m | Avg: 31m 22s | Max: 40m 19s
      🟩 MSVC               Pass: 100%/5   | Total:  5h 14m | Avg:  1h 02m | Max:  1h 13m | Hits: 149%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 23h 24m | Avg: 37m 58s | Max:  1h 13m | Hits: 149%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 21h 51m | Avg: 42m 19s | Max:  1h 13m | Hits:  95%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 50m 49s | Avg: 16m 56s | Max: 34m 43s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 42m 04s | Avg: 14m 01s | Max: 16m 32s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 24m 54s | Avg: 24m 54s | Max: 24m 54s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 10h 44m | Avg: 46m 02s | Max:  1h 11m | Hits: 100%/5532  
      🟩 20                 Pass: 100%/21  | Total: 11h 57m | Avg: 34m 08s | Max:  1h 13m | Hits: 221%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 56s | Avg: 4m 58s | Max: 7m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  7m 53s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s
      🟩 Test               Pass: 100%/1   | Total:  7m 53s | Avg:  7m 53s | Max:  7m 53s
    
  • 🟩 python: Pass: 100%/1 | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work

@NaderAlAwar NaderAlAwar enabled auto-merge (squash) January 23, 2025 16:46
@NaderAlAwar NaderAlAwar removed the request for review from gevtushenko January 23, 2025 17:14
Copy link
Contributor

🟩 CI finished in 4h 11m: Pass: 100%/78 | Total: 2d 07h | Avg: 42m 40s | Max: 1h 15m | Hits: 164%/12772
  • 🟩 cub: Pass: 100%/38 | Total: 1d 08h | Avg: 51m 22s | Max: 1h 11m | Hits: 196%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 06h | Avg: 51m 04s | Max:  1h 11m | Hits: 196%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 52s | Max: 57m 07s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 54m | Avg: 58m 58s | Max:  1h 03m | Hits: 196%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 11m
      🟩 12.6               Pass: 100%/31  | Total:  1d 01h | Avg: 48m 58s | Max:  1h 07m | Hits: 196%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 57m | Avg: 58m 34s | Max: 58m 48s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 54m | Avg: 58m 58s | Max:  1h 03m | Hits: 196%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 11m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 23h 21m | Avg: 48m 18s | Max:  1h 07m | Hits: 196%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 34s | Max: 58m 48s
      🟩 nvcc               Pass: 100%/36  | Total:  1d 06h | Avg: 50m 58s | Max:  1h 11m | Hits: 196%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 51m | Avg: 57m 56s | Max:  1h 00m
      🟩 Clang15            Pass: 100%/1   | Total: 58m 53s | Avg: 58m 53s | Max: 58m 53s
      🟩 Clang16            Pass: 100%/1   | Total: 56m 20s | Avg: 56m 20s | Max: 56m 20s
      🟩 Clang17            Pass: 100%/1   | Total: 57m 09s | Avg: 57m 09s | Max: 57m 09s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 33m | Avg: 47m 40s | Max: 58m 48s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 25s | Max: 58m 40s
      🟩 GCC8               Pass: 100%/1   | Total: 54m 30s | Avg: 54m 30s | Max: 54m 30s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 10s | Max:  1h 00m
      🟩 GCC10              Pass: 100%/1   | Total: 55m 14s | Avg: 55m 14s | Max: 55m 14s
      🟩 GCC11              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 GCC12              Pass: 100%/3   | Total:  2h 03m | Avg: 41m 00s | Max: 55m 31s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 49m | Avg: 36m 10s | Max:  1h 01m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m | Hits: 196%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 07m | Hits: 196%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 11m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 17m | Avg: 52m 42s | Max:  1h 00m
      🟩 GCC                Pass: 100%/18  | Total: 13h 34m | Avg: 45m 13s | Max:  1h 01m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 21m | Avg:  1h 05m | Max:  1h 07m | Hits: 196%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 11m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 45s | Max: 41m 57s
      🟩 v100               Pass: 100%/36  | Total:  1d 07h | Avg: 52m 21s | Max:  1h 11m | Hits: 196%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 05h | Avg: 57m 15s | Max:  1h 11m | Hits: 196%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 45s | Avg: 19m 45s | Max: 19m 45s
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 08s | Avg: 18m 08s | Max: 18m 08s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 25m | Avg: 28m 34s | Max: 41m 57s
      🟩 TestGPU            Pass: 100%/2   | Total: 53m 39s | Avg: 26m 49s | Max: 28m 41s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total:  1h 07m | Avg: 33m 45s | Max: 41m 57s
      🟩 90a                Pass: 100%/1   | Total: 23m 22s | Avg: 23m 22s | Max: 23m 22s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 14h 03m | Avg:  1h 00m | Max:  1h 11m | Hits: 196%/2664  
      🟩 20                 Pass: 100%/24  | Total: 18h 28m | Avg: 46m 11s | Max:  1h 07m | Hits: 195%/888   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 21h 52m | Avg: 35m 28s | Max: 1h 15m | Hits: 152%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 41s | Avg: 20m 50s | Max: 27m 40s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 20h 48m | Avg: 35m 40s | Max:  1h 15m | Hits: 152%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 00s | Max: 34m 00s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 18m | Avg: 39m 41s | Max:  1h 08m | Hits:  90%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m
      🟩 12.6               Pass: 100%/30  | Total: 16h 11m | Avg: 32m 23s | Max:  1h 15m | Hits: 167%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 01m | Avg: 30m 51s | Max: 31m 50s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 18m | Avg: 39m 41s | Max:  1h 08m | Hits:  90%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m
      🟩 nvcc12.6           Pass: 100%/28  | Total: 15h 10m | Avg: 32m 30s | Max:  1h 15m | Hits: 167%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 51s | Max: 31m 50s
      🟩 nvcc               Pass: 100%/35  | Total: 20h 51m | Avg: 35m 44s | Max:  1h 15m | Hits: 152%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 45s | Max: 33m 22s
      🟩 Clang15            Pass: 100%/1   | Total: 32m 27s | Avg: 32m 27s | Max: 32m 27s
      🟩 Clang16            Pass: 100%/1   | Total: 34m 21s | Avg: 34m 21s | Max: 34m 21s
      🟩 Clang17            Pass: 100%/1   | Total: 34m 32s | Avg: 34m 32s | Max: 34m 32s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 55m | Avg: 25m 04s | Max: 33m 03s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 35s | Max: 34m 13s
      🟩 GCC8               Pass: 100%/1   | Total: 31m 01s | Avg: 31m 01s | Max: 31m 01s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 41s | Max: 33m 17s
      🟩 GCC10              Pass: 100%/1   | Total: 35m 31s | Avg: 35m 31s | Max: 35m 31s
      🟩 GCC11              Pass: 100%/1   | Total: 36m 28s | Avg: 36m 28s | Max: 36m 28s
      🟩 GCC12              Pass: 100%/1   | Total: 34m 28s | Avg: 34m 28s | Max: 34m 28s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 07m | Avg: 23m 24s | Max: 37m 20s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 08m | Hits: 105%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 54m | Avg: 58m 18s | Max:  1h 15m | Hits: 183%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  6h 43m | Avg: 28m 51s | Max: 34m 32s
      🟩 GCC                Pass: 100%/16  | Total:  7h 35m | Avg: 28m 27s | Max: 37m 20s
      🟩 MSVC               Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 15m | Hits: 152%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 21h 52m | Avg: 35m 28s | Max:  1h 15m | Hits: 152%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 20h 24m | Avg: 39m 30s | Max:  1h 15m | Hits:  98%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 46s | Avg: 16m 35s | Max: 33m 55s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 38m 08s | Avg: 12m 42s | Max: 14m 01s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 00s | Avg: 20m 00s | Max: 20m 00s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  9h 53m | Avg: 42m 25s | Max:  1h 11m | Hits: 101%/5532  
      🟩 20                 Pass: 100%/21  | Total: 11h 17m | Avg: 32m 14s | Max:  1h 15m | Hits: 228%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 35s | Avg: 5m 47s | Max: 9m 21s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  9m 21s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 14s | Avg:  2m 14s | Max:  2m 14s
      🟩 Test               Pass: 100%/1   | Total:  9m 21s | Avg:  9m 21s | Max:  9m 21s
    
  • 🟩 python: Pass: 100%/1 | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 52m 30s | Avg: 52m 30s | Max: 52m 30s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@NaderAlAwar NaderAlAwar merged commit fd7ad82 into NVIDIA:main Jan 23, 2025
89 of 92 checks passed
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 29, 2025
* Move merge_sort kernels to separate file

* Add merge_sort nvrtc test

* Remove include that contains host code and replace with cuda::std

* Remove unneeded headers from merge_sort header

* Move LoadIterator to separate header and replace include

* Add host device macro to has_nested_type to fix nvrtc issue

* Extract make_load_iterator into separate file to avoid nvrtc error

* Extract is_thrust_pointer into separate file to avoid nvrtc error

* Extract policy_wrapper_t into separate file, forward declare LoadIterator, and use ::cuda::std instead of std to avoid nvrtc errors

* Extract unwrap_contiguous_iterator into separate file to avoid nvrtc errors

* Add missing include following header reorganization

* Add comment explaining why we forward declare make_load_iterator

* Add missing iterator include

* Add missing thrust config include

* Use is_same_v and rearrange include according to formatter

* Add missing comment to endif

* Use SPDX license instead of longer one

* Use nested namespace specifier

* Use nested namespace specifiers and _v suffix in other files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Extract merge sort kernels to NVRTC compilable header
4 participants