Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for large num items to DeviceMerge #3530

Merged
merged 6 commits into from
Jan 30, 2025

Conversation

elstehle
Copy link
Collaborator

@elstehle elstehle commented Jan 27, 2025

Description

Closes #3134

Decided to go with single offset type template instantiation:

  • Performance for merging pairs with wider offset type is, on average, even better than for 32 bit wide type
  • Choosing the offset type statically is challenging, since we take the size of the LHS and the size of the RHS, even if both are int, the output size may require 64-bit indexing
  • We need to compile only a single kernel independent of the user-provided offset type

Performance summary for merge.pairs on H100:

i64/i32 time i64/i32 time (2^28)
min 74.39% 74.39%
max 105.08% 105.08%
avg 99.98% 99.42%

Performance summary for merge.keys on H100:

x i64/i32 time i64/i32 time (2^28)
min 98.98% 99.98%
max 107.28% 107.28%
avg 100.91% 101.30%

@elstehle elstehle requested a review from a team as a code owner January 27, 2025 07:11
Copy link
Contributor

🟩 CI finished in 1h 41m: Pass: 100%/90 | Total: 2d 02h | Avg: 33m 52s | Max: 1h 00m | Hits: 312%/12772
  • 🟩 cub: Pass: 100%/44 | Total: 1d 03h | Avg: 37m 09s | Max: 59m 07s | Hits: 375%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 02h | Avg: 37m 14s | Max: 59m 07s | Hits: 375%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 08s | Max: 37m 13s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 37m | Avg: 43m 28s | Max: 53m 29s | Hits: 375%/888   
      🟩 12.5               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 59m 07s
      🟩 12.6               Pass: 100%/37  | Total: 21h 42m | Avg: 35m 12s | Max: 51m 24s | Hits: 375%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 24m 18s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 37m | Avg: 43m 28s | Max: 53m 29s | Hits: 375%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 59m 07s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 20h 55m | Avg: 35m 51s | Max: 51m 24s | Hits: 375%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 24m 18s
      🟩 nvcc               Pass: 100%/42  | Total:  1d 02h | Avg: 37m 47s | Max: 59m 07s | Hits: 375%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 32m | Avg: 38m 14s | Max: 41m 17s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 15m | Avg: 37m 57s | Max: 40m 47s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 16m | Avg: 38m 25s | Max: 40m 50s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 18m | Avg: 39m 13s | Max: 40m 34s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 21m | Avg: 28m 43s | Max: 36m 52s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 20m | Avg: 40m 11s | Max: 40m 25s
      🟩 GCC8               Pass: 100%/1   | Total: 41m 01s | Avg: 41m 01s | Max: 41m 01s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 24m | Avg: 42m 12s | Max: 42m 39s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 24m | Avg: 42m 14s | Max: 43m 39s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 16m | Avg: 38m 24s | Max: 40m 01s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 07m | Avg: 31m 48s | Max: 43m 02s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 58m | Avg: 29m 47s | Max: 42m 59s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 37s | Max: 53m 29s | Hits: 375%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 32s | Max: 51m 24s | Hits: 375%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 59m 07s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  9h 45m | Avg: 34m 25s | Max: 41m 17s
      🟩 GCC                Pass: 100%/21  | Total: 12h 12m | Avg: 34m 53s | Max: 43m 39s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 22m | Avg: 50m 34s | Max: 53m 29s | Hits: 375%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 59m 07s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 42m 00s | Avg: 21m 00s | Max: 21m 10s
      🟩 v100               Pass: 100%/42  | Total:  1d 02h | Avg: 37m 55s | Max: 59m 07s | Hits: 375%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 00h | Avg: 39m 39s | Max: 59m 07s | Hits: 375%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 28m 26s | Avg: 28m 26s | Max: 28m 26s
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 33s | Avg: 18m 33s | Max: 18m 33s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 53s | Max: 25m 36s
      🟩 TestGPU            Pass: 100%/2   | Total: 48m 31s | Avg: 24m 15s | Max: 24m 26s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 42m 00s | Avg: 21m 00s | Max: 21m 10s
      🟩 90a                Pass: 100%/1   | Total: 19m 53s | Avg: 19m 53s | Max: 19m 53s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 13h 44m | Avg: 41m 14s | Max: 55m 21s | Hits: 375%/2664  
      🟩 20                 Pass: 100%/24  | Total: 13h 29m | Avg: 33m 44s | Max: 59m 07s | Hits: 374%/888   
    
  • 🟩 thrust: Pass: 100%/43 | Total: 22h 44m | Avg: 31m 43s | Max: 1h 00m | Hits: 288%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 40m 21s | Avg: 20m 10s | Max: 27m 03s
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 21h 47m | Avg: 31m 52s | Max:  1h 00m | Hits: 288%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 56m 49s | Avg: 28m 24s | Max: 29m 36s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 03m | Avg: 36m 44s | Max: 51m 40s | Hits: 269%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 46m | Avg: 53m 00s | Max: 55m 12s
      🟩 12.6               Pass: 100%/36  | Total: 17h 54m | Avg: 29m 50s | Max:  1h 00m | Hits: 293%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 25s | Avg: 26m 42s | Max: 27m 10s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 03m | Avg: 36m 44s | Max: 51m 40s | Hits: 269%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 46m | Avg: 53m 00s | Max: 55m 12s
      🟩 nvcc12.6           Pass: 100%/34  | Total: 17h 00m | Avg: 30m 01s | Max:  1h 00m | Hits: 293%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 25s | Avg: 26m 42s | Max: 27m 10s
      🟩 nvcc               Pass: 100%/41  | Total: 21h 50m | Avg: 31m 57s | Max:  1h 00m | Hits: 288%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 55s | Max: 32m 37s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 14s | Max: 32m 39s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 05s | Max: 32m 38s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 52s | Max: 31m 30s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 46m | Avg: 23m 50s | Max: 31m 49s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 05s | Max: 33m 45s
      🟩 GCC8               Pass: 100%/1   | Total: 28m 46s | Avg: 28m 46s | Max: 28m 46s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 20s | Max: 33m 49s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 49s | Max: 32m 06s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 00s | Max: 32m 33s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 19s | Max: 34m 38s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 54m | Avg: 21m 45s | Max: 33m 34s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 44m | Avg: 52m 07s | Max: 52m 34s | Hits: 269%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 30m | Avg: 50m 17s | Max:  1h 00m | Hits: 301%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 46m | Avg: 53m 00s | Max: 55m 12s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 58m | Avg: 28m 10s | Max: 32m 39s
      🟩 GCC                Pass: 100%/19  | Total:  8h 43m | Avg: 27m 34s | Max: 34m 38s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 15m | Avg: 51m 01s | Max:  1h 00m | Hits: 288%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 46m | Avg: 53m 00s | Max: 55m 12s
    🟩 gpu
      🟩 v100               Pass: 100%/43  | Total: 22h 44m | Avg: 31m 43s | Max:  1h 00m | Hits: 288%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 21h 08m | Avg: 34m 17s | Max:  1h 00m | Hits: 269%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 51m 42s | Avg: 17m 14s | Max: 35m 40s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 43m 22s | Avg: 14m 27s | Max: 17m 27s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 17m 27s | Avg: 17m 27s | Max: 17m 27s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 11h 48m | Avg: 35m 24s | Max: 54m 12s | Hits: 269%/5532  
      🟩 20                 Pass: 100%/21  | Total: 10h 15m | Avg: 29m 18s | Max:  1h 00m | Hits: 317%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 41s | Avg: 4m 50s | Max: 7m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  7m 29s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s
      🟩 Test               Pass: 100%/1   | Total:  7m 29s | Avg:  7m 29s | Max:  7m 29s
    
  • 🟩 python: Pass: 100%/1 | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 40m 29s | Avg: 40m 29s | Max: 40m 29s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

cub/test/catch2_test_device_merge.cu Outdated Show resolved Hide resolved
cub/test/catch2_test_device_merge.cu Outdated Show resolved Hide resolved
Copy link
Contributor

🟨 CI finished in 2h 27m: Pass: 98%/90 | Total: 1d 11h | Avg: 23m 36s | Max: 1h 02m | Hits: 421%/10928
  • 🟨 thrust: Pass: 97%/43 | Total: 6h 49m | Avg: 9m 30s | Max: 32m 29s | Hits: 365%/7376

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/41  | Total:  6h 39m | Avg:  9m 44s | Max: 32m 29s | Hits: 365%/7376  
      🟩 arm64              Pass: 100%/2   | Total:  9m 36s | Avg:  4m 48s | Max:  5m 00s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total: 44m 42s | Avg:  8m 56s | Max: 23m 57s | Hits: 365%/1844  
      🟩 12.5               Pass: 100%/2   | Total: 28m 59s | Avg: 14m 29s | Max: 15m 09s
      🔍 12.6               Pass:  97%/36  | Total:  5h 35m | Avg:  9m 19s | Max: 32m 29s | Hits: 365%/5532  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  5m 24s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 42s | Avg:  8m 56s | Max: 23m 57s | Hits: 365%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 59s | Avg: 14m 29s | Max: 15m 09s
      🔍 nvcc12.6           Pass:  97%/34  | Total:  5h 24m | Avg:  9m 33s | Max: 32m 29s | Hits: 365%/5532  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  5m 24s
      🔍 nvcc               Pass:  97%/41  | Total:  6h 38m | Avg:  9m 43s | Max: 32m 29s | Hits: 365%/7376  
    🔍 cxx: MSVC14.39 🔍
      🟩 Clang14            Pass: 100%/4   | Total: 21m 40s | Avg:  5m 25s | Max:  5m 47s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  6m 00s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 52s | Avg:  5m 56s | Max:  6m 06s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 41s
      🟩 Clang18            Pass: 100%/7   | Total: 50m 58s | Avg:  7m 16s | Max: 16m 20s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 54s | Avg:  5m 27s | Max:  5m 28s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 55s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 49s | Avg:  5m 54s | Max:  6m 16s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  6m 06s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 14s | Avg:  6m 07s | Max:  6m 13s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 04m | Avg:  8m 07s | Max: 16m 22s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 53m 11s | Avg: 26m 35s | Max: 29m 14s | Hits: 365%/3688  
      🔍 MSVC14.39          Pass:  66%/3   | Total:  1h 31m | Avg: 30m 36s | Max: 32m 29s | Hits: 365%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 59s | Avg: 14m 29s | Max: 15m 09s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/17  | Total:  1h 47m | Avg:  6m 18s | Max: 16m 20s
      🟩 GCC                Pass: 100%/19  | Total:  2h 07m | Avg:  6m 44s | Max: 16m 22s
      🔍 MSVC               Pass:  80%/5   | Total:  2h 24m | Avg: 28m 59s | Max: 32m 29s | Hits: 365%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 59s | Avg: 14m 29s | Max: 15m 09s
    🔍 jobs: TestCPU 🔍
      🟩 Build              Pass: 100%/37  | Total:  5h 15m | Avg:  8m 31s | Max: 30m 17s | Hits: 365%/7376  
      🔍 TestCPU            Pass:  66%/3   | Total: 48m 17s | Avg: 16m 05s | Max: 32m 29s
      🟩 TestGPU            Pass: 100%/3   | Total: 45m 21s | Avg: 15m 07s | Max: 16m 22s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total:  3h 10m | Avg:  9m 30s | Max: 30m 17s | Hits: 365%/5532  
      🔍 20                 Pass:  95%/21  | Total:  3h 20m | Avg:  9m 32s | Max: 32m 29s | Hits: 365%/1844  
    🟨 gpu
      🟨 v100               Pass:  97%/43  | Total:  6h 49m | Avg:  9m 30s | Max: 32m 29s | Hits: 365%/7376  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 18m 32s | Avg:  9m 16s | Max: 12m 39s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 46s | Avg:  4m 46s | Max:  4m 46s
    
  • 🟩 cub: Pass: 100%/44 | Total: 1d 03h | Avg: 37m 33s | Max: 1h 02m | Hits: 538%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 02h | Avg: 37m 15s | Max:  1h 02m | Hits: 538%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 27m | Avg: 43m 49s | Max: 43m 55s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 29m | Avg: 41m 50s | Max: 59m 41s | Hits: 538%/888   
      🟩 12.5               Pass: 100%/2   | Total:  1h 18m | Avg: 39m 20s | Max: 39m 58s
      🟩 12.6               Pass: 100%/37  | Total: 22h 44m | Avg: 36m 53s | Max:  1h 02m | Hits: 538%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 45m | Avg: 52m 32s | Max: 54m 12s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 29m | Avg: 41m 50s | Max: 59m 41s | Hits: 538%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 18m | Avg: 39m 20s | Max: 39m 58s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 20h 59m | Avg: 35m 59s | Max:  1h 02m | Hits: 538%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 45m | Avg: 52m 32s | Max: 54m 12s
      🟩 nvcc               Pass: 100%/42  | Total:  1d 01h | Avg: 36m 51s | Max:  1h 02m | Hits: 538%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 24m | Avg: 36m 04s | Max: 38m 33s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 10m | Avg: 35m 28s | Max: 35m 58s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 39s | Max: 38m 39s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 10m | Avg: 35m 25s | Max: 35m 42s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 49m | Avg: 41m 18s | Max: 54m 12s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 57s | Max: 38m 56s
      🟩 GCC8               Pass: 100%/1   | Total: 36m 30s | Avg: 36m 30s | Max: 36m 30s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 15m | Avg: 37m 34s | Max: 39m 22s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 19m | Avg: 39m 30s | Max: 42m 01s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 42s | Max: 36m 37s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 47m | Avg: 26m 50s | Max: 37m 13s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 59m | Avg: 29m 54s | Max: 43m 55s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 04s | Max: 59m 41s | Hits: 538%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 02m | Hits: 538%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 18m | Avg: 39m 20s | Max: 39m 58s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 10h 48m | Avg: 38m 08s | Max: 54m 12s
      🟩 GCC                Pass: 100%/21  | Total: 11h 22m | Avg: 32m 30s | Max: 43m 55s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 03m | Avg:  1h 00m | Max:  1h 02m | Hits: 538%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 20s | Max: 39m 58s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 34m 48s | Avg: 17m 24s | Max: 20m 56s
      🟩 v100               Pass: 100%/42  | Total:  1d 02h | Avg: 38m 31s | Max:  1h 02m | Hits: 538%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 00h | Avg: 39m 14s | Max:  1h 02m | Hits: 538%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 26m 58s | Avg: 26m 58s | Max: 26m 58s
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 32s | Avg: 19m 32s | Max: 19m 32s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 34s | Max: 26m 18s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 23m | Avg: 41m 44s | Max: 47m 26s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 34m 48s | Avg: 17m 24s | Max: 20m 56s
      🟩 90a                Pass: 100%/1   | Total: 14m 03s | Avg: 14m 03s | Max: 14m 03s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 13h 41m | Avg: 41m 05s | Max:  1h 02m | Hits: 538%/2664  
      🟩 20                 Pass: 100%/24  | Total: 13h 50m | Avg: 34m 37s | Max:  1h 02m | Hits: 538%/888   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 19s | Avg: 5m 39s | Max: 9m 06s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  9m 06s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s
      🟩 Test               Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    
  • 🟩 python: Pass: 100%/1 | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@elstehle elstehle mentioned this pull request Jan 29, 2025
25 tasks
Copy link
Contributor

🟨 CI finished in 1d 01h: Pass: 98%/89 | Total: 2d 13h | Avg: 41m 26s | Max: 1h 14m | Hits: 303%/10928
  • 🟨 cub: Pass: 97%/44 | Total: 1d 13h | Avg: 51m 32s | Max: 1h 14m | Hits: 375%/3552

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/42  | Total:  1d 11h | Avg: 51m 18s | Max:  1h 14m | Hits: 375%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 26s | Max: 57m 31s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 52m | Avg: 58m 27s | Max:  1h 02m | Hits: 375%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 14m
      🔍 12.6               Pass:  97%/37  | Total:  1d 06h | Avg: 49m 38s | Max:  1h 07m | Hits: 375%/2664  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 58m | Avg: 59m 23s | Max:  1h 00m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 52m | Avg: 58m 27s | Max:  1h 02m | Hits: 375%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 14m
      🔍 nvcc12.6           Pass:  97%/35  | Total:  1d 04h | Avg: 49m 05s | Max:  1h 07m | Hits: 375%/2664  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 23s | Max:  1h 00m
      🔍 nvcc               Pass:  97%/42  | Total:  1d 11h | Avg: 51m 10s | Max:  1h 14m | Hits: 375%/3552  
    🔍 cxx: GCC12 🔍
      🟩 Clang14            Pass: 100%/4   | Total:  3h 49m | Avg: 57m 17s | Max: 58m 21s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 41s | Max: 58m 07s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 55s | Max: 59m 08s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 16s | Max:  1h 00m
      🟩 Clang18            Pass: 100%/7   | Total:  5h 45m | Avg: 49m 21s | Max:  1h 04m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 14s | Max: 55m 02s
      🟩 GCC8               Pass: 100%/1   | Total: 57m 34s | Avg: 57m 34s | Max: 57m 34s
      🟩 GCC9               Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
      🟩 GCC10              Pass: 100%/2   | Total:  1h 47m | Avg: 53m 54s | Max: 55m 02s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 01m
      🔍 GCC12              Pass:  75%/4   | Total:  2h 17m | Avg: 34m 20s | Max:  1h 00m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 57m | Avg: 37m 13s | Max:  1h 03m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits: 375%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 07m | Hits: 375%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 14m
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/17  | Total: 15h 20m | Avg: 54m 08s | Max:  1h 04m
      🔍 GCC                Pass:  95%/21  | Total: 15h 49m | Avg: 45m 13s | Max:  1h 03m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 18m | Avg:  1h 04m | Max:  1h 07m | Hits: 375%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 14m
    🔍 gpu: h100 🔍
      🔍 h100               Pass:  50%/2   | Total: 24m 28s | Avg: 12m 14s | Max: 24m 28s
      🟩 v100               Pass: 100%/42  | Total:  1d 13h | Avg: 53m 24s | Max:  1h 14m | Hits: 375%/3552  
    🔍 jobs: HostLaunch 🔍
      🟩 Build              Pass: 100%/37  | Total:  1d 11h | Avg: 57m 26s | Max:  1h 14m | Hits: 375%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 27m 00s | Avg: 27m 00s | Max: 27m 00s
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 50s | Avg: 20m 50s | Max: 20m 50s
      🔍 HostLaunch         Pass:  66%/3   | Total: 46m 12s | Avg: 15m 24s | Max: 25m 56s
      🟩 TestGPU            Pass: 100%/2   | Total: 48m 29s | Avg: 24m 14s | Max: 25m 29s
    🔍 sm: 90 🔍
      🔍 90                 Pass:  50%/2   | Total: 24m 28s | Avg: 12m 14s | Max: 24m 28s
      🟩 90a                Pass: 100%/1   | Total: 24m 27s | Avg: 24m 27s | Max: 24m 27s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 19h 42m | Avg: 59m 08s | Max:  1h 14m | Hits: 375%/2664  
      🔍 20                 Pass:  95%/24  | Total: 18h 05m | Avg: 45m 12s | Max:  1h 07m | Hits: 374%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 22h 38m | Avg: 32m 20s | Max: 1h 01m | Hits: 269%/7376

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 39m 25s | Avg: 19m 42s | Max: 23m 22s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 21h 42m | Avg: 32m 33s | Max:  1h 01m | Hits: 269%/7376  
      🟩 arm64              Pass: 100%/2   | Total: 55m 46s | Avg: 27m 53s | Max: 28m 57s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 59m | Avg: 35m 59s | Max: 55m 14s | Hits: 269%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 45m | Avg: 52m 52s | Max: 54m 25s
      🟩 12.6               Pass: 100%/35  | Total: 17h 52m | Avg: 30m 38s | Max:  1h 01m | Hits: 269%/5532  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 49m 02s | Avg: 24m 31s | Max: 25m 15s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 59m | Avg: 35m 59s | Max: 55m 14s | Hits: 269%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 45m | Avg: 52m 52s | Max: 54m 25s
      🟩 nvcc12.6           Pass: 100%/33  | Total: 17h 03m | Avg: 31m 00s | Max:  1h 01m | Hits: 269%/5532  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 02s | Avg: 24m 31s | Max: 25m 15s
      🟩 nvcc               Pass: 100%/40  | Total: 21h 49m | Avg: 32m 43s | Max:  1h 01m | Hits: 269%/7376  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 27s | Max: 30m 22s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 30s | Max: 35m 40s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 32s | Max: 32m 30s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 08m | Avg: 34m 20s | Max: 36m 46s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 37m | Avg: 22m 30s | Max: 30m 35s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 31s | Max: 32m 28s
      🟩 GCC8               Pass: 100%/1   | Total: 32m 13s | Avg: 32m 13s | Max: 32m 13s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 35s | Max: 35m 14s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 48s | Max: 32m 37s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 32s | Max: 33m 39s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 34s | Max: 35m 10s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 02m | Avg: 22m 49s | Max: 34m 46s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 35s | Max:  1h 01m | Hits: 269%/3688  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m | Hits: 269%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 45m | Avg: 52m 52s | Max: 54m 25s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 54m | Avg: 27m 53s | Max: 36m 46s
      🟩 GCC                Pass: 100%/19  | Total:  9h 00m | Avg: 28m 27s | Max: 35m 14s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 57m | Avg: 59m 23s | Max:  1h 01m | Hits: 269%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 45m | Avg: 52m 52s | Max: 54m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/42  | Total: 22h 38m | Avg: 32m 20s | Max:  1h 01m | Hits: 269%/7376  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 21h 31m | Avg: 34m 54s | Max:  1h 01m | Hits: 269%/7376  
      🟩 TestCPU            Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max:  8m 10s
      🟩 TestGPU            Pass: 100%/3   | Total: 50m 46s | Avg: 16m 55s | Max: 20m 54s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 44s | Avg: 18m 44s | Max: 18m 44s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 13m | Avg: 36m 41s | Max:  1h 01m | Hits: 269%/5532  
      🟩 20                 Pass: 100%/20  | Total:  9h 44m | Avg: 29m 14s | Max: 58m 48s | Hits: 269%/1844  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 53s | Avg: 6m 26s | Max: 10m 38s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max: 10m 38s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
      🟩 Test               Pass: 100%/1   | Total: 10m 38s | Avg: 10m 38s | Max: 10m 38s
    
  • 🟩 python: Pass: 100%/1 | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
8 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

🟩 CI finished in 1h 25m: Pass: 100%/89 | Total: 2d 12h | Avg: 40m 27s | Max: 1h 12m | Hits: 303%/10936
  • 🟩 cub: Pass: 100%/44 | Total: 1d 13h | Avg: 51m 29s | Max: 1h 12m | Hits: 375%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 11h | Avg: 50m 56s | Max:  1h 12m | Hits: 375%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 09m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 38m | Avg: 55m 46s | Max: 58m 03s | Hits: 375%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m
      🟩 12.6               Pass: 100%/37  | Total:  1d 06h | Avg: 49m 57s | Max:  1h 12m | Hits: 375%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 38m | Avg: 55m 46s | Max: 58m 03s | Hits: 375%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1d 04h | Avg: 49m 17s | Max:  1h 12m | Hits: 375%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m
      🟩 nvcc               Pass: 100%/42  | Total:  1d 11h | Avg: 51m 00s | Max:  1h 12m | Hits: 375%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 43m | Avg: 55m 49s | Max: 58m 33s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 44m | Avg: 52m 23s | Max: 52m 39s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 51m | Avg: 55m 45s | Max: 58m 23s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 52s | Max: 58m 12s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 35m | Avg: 47m 52s | Max:  1h 03m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 52m | Avg: 56m 24s | Max: 58m 38s
      🟩 GCC8               Pass: 100%/1   | Total: 59m 30s | Avg: 59m 30s | Max: 59m 30s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 30s | Max: 56m 47s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 00s | Max: 55m 03s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 41s | Max: 58m 41s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 56m | Avg: 44m 08s | Max:  1h 01m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 46m | Avg: 35m 47s | Max:  1h 09m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 12m | Hits: 375%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits: 375%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 14h 48m | Avg: 52m 15s | Max:  1h 03m
      🟩 GCC                Pass: 100%/21  | Total: 16h 07m | Avg: 46m 04s | Max:  1h 09m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 31m | Avg:  1h 07m | Max:  1h 12m | Hits: 375%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 09m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 56m 08s | Avg: 28m 04s | Max: 29m 39s
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 59m | Avg: 29m 55s | Max: 55m 57s
      🟩 v100               Pass: 100%/34  | Total:  1d 08h | Avg: 57m 56s | Max:  1h 12m | Hits: 375%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 11h | Avg: 56m 56s | Max:  1h 12m | Hits: 375%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 45s | Avg: 21m 45s | Max: 21m 45s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 11s | Avg: 15m 11s | Max: 15m 11s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 22m | Avg: 27m 28s | Max: 29m 39s
      🟩 TestGPU            Pass: 100%/2   | Total: 39m 30s | Avg: 19m 45s | Max: 20m 45s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 56m 08s | Avg: 28m 04s | Max: 29m 39s
      🟩 90a                Pass: 100%/1   | Total: 22m 42s | Avg: 22m 42s | Max: 22m 42s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 38m | Avg: 58m 55s | Max:  1h 12m | Hits: 375%/2664  
      🟩 20                 Pass: 100%/24  | Total: 18h 06m | Avg: 45m 17s | Max:  1h 11m | Hits: 374%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 21h 43m | Avg: 31m 01s | Max: 1h 00m | Hits: 269%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 35m 50s | Avg: 17m 55s | Max: 24m 45s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 20h 45m | Avg: 31m 08s | Max:  1h 00m | Hits: 269%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 57m 37s | Avg: 28m 48s | Max: 30m 11s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 56m | Avg: 35m 15s | Max: 53m 49s | Hits: 269%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  1h 36m | Avg: 48m 29s | Max: 49m 11s
      🟩 12.6               Pass: 100%/35  | Total: 17h 09m | Avg: 29m 25s | Max:  1h 00m | Hits: 269%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 50m 52s | Avg: 25m 26s | Max: 25m 41s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 56m | Avg: 35m 15s | Max: 53m 49s | Hits: 269%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 36m | Avg: 48m 29s | Max: 49m 11s
      🟩 nvcc12.6           Pass: 100%/33  | Total: 16h 18m | Avg: 29m 39s | Max:  1h 00m | Hits: 269%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 50m 52s | Avg: 25m 26s | Max: 25m 41s
      🟩 nvcc               Pass: 100%/40  | Total: 20h 52m | Avg: 31m 18s | Max:  1h 00m | Hits: 269%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 49s | Max: 30m 46s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 00m | Avg: 30m 29s | Max: 32m 28s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 57s | Max: 32m 28s
      🟩 Clang17            Pass: 100%/2   | Total: 57m 01s | Avg: 28m 30s | Max: 28m 41s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 34m | Avg: 22m 01s | Max: 30m 15s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 16s | Max: 32m 37s
      🟩 GCC8               Pass: 100%/1   | Total: 32m 23s | Avg: 32m 23s | Max: 32m 23s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 46s | Max: 32m 47s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 25s | Max: 31m 35s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 21s | Max: 33m 29s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 26s | Max: 33m 27s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 50m | Avg: 21m 21s | Max: 35m 32s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 50m | Avg: 55m 26s | Max: 57m 04s | Hits: 269%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits: 269%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 29s | Max: 49m 11s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 33m | Avg: 26m 40s | Max: 32m 28s
      🟩 GCC                Pass: 100%/19  | Total:  8h 41m | Avg: 27m 27s | Max: 35m 32s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 50m | Avg: 57m 44s | Max:  1h 00m | Hits: 269%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 36m | Avg: 48m 29s | Max: 49m 11s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  2h 18m | Avg: 17m 17s | Max: 35m 32s
      🟩 v100               Pass: 100%/34  | Total: 19h 24m | Avg: 34m 15s | Max:  1h 00m | Hits: 269%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 20h 55m | Avg: 33m 55s | Max:  1h 00m | Hits: 269%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 14m 51s | Avg:  7m 25s | Max:  7m 32s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 57s | Avg: 10m 59s | Max: 11m 18s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 17m 34s | Avg: 17m 34s | Max: 17m 34s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 11h 47m | Avg: 35m 21s | Max:  1h 00m | Hits: 269%/5538  
      🟩 20                 Pass: 100%/20  | Total:  9h 20m | Avg: 28m 00s | Max: 59m 57s | Hits: 269%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 7m 15s | Avg: 3m 37s | Max: 5m 02s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  5m 02s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s
      🟩 Test               Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@elstehle elstehle merged commit c02e845 into NVIDIA:main Jan 30, 2025
100 of 104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add support for large num_items to device_merge.cuh
2 participants