Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bit sized int types #3956

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

davebayer
Copy link
Contributor

  • implement an internal bit sized int type
  • introduce __always_false trait
  • use the __uint_t in byteswap and fix unchecked ptx ISA and device's SM version
  • remove unused cuda/std/__type_traits/make_32_64_or_128_bit.h module

@davebayer davebayer requested a review from a team as a code owner February 27, 2025 14:35
@davebayer davebayer requested a review from griwes February 27, 2025 14:35
Copy link

copy-pr-bot bot commented Feb 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@miscco
Copy link
Collaborator

miscco commented Feb 27, 2025

/ok to test

Copy link
Contributor

🟨 CI finished in 1h 29m: Pass: 29%/158 | Total: 2d 06h | Avg: 20m 30s | Max: 1h 19m | Hits: 30%/53485
  • 🟥 thrust: Pass: 0%/45 | Total: 7h 43m | Avg: 10m 17s | Max: 1h 08m

    🟥 cmake_options
      🟥 -DTHRUST_DISPATCH_TYPE=Force32bit Pass:   0%/2   | Total:  6m 07s | Avg:  3m 03s | Max:  6m 07s
    🟥 cpu
      🟥 amd64              Pass:   0%/43  | Total:  7h 31m | Avg: 10m 30s | Max:  1h 08m
      🟥 arm64              Pass:   0%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  5m 59s
    🟥 ctk
      🟥 12.0               Pass:   0%/5   | Total:  1h 22m | Avg: 16m 25s | Max: 58m 39s
      🟥 12.5               Pass:   0%/2   | Total: 18m 44s | Avg:  9m 22s | Max:  9m 43s
      🟥 12.8               Pass:   0%/38  | Total:  6h 02m | Avg:  9m 32s | Max:  1h 08m
    🟥 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  5m 00s
      🟥 nvcc12.0           Pass:   0%/5   | Total:  1h 22m | Avg: 16m 25s | Max: 58m 39s
      🟥 nvcc12.5           Pass:   0%/2   | Total: 18m 44s | Avg:  9m 22s | Max:  9m 43s
      🟥 nvcc12.8           Pass:   0%/36  | Total:  5h 52m | Avg:  9m 47s | Max:  1h 08m
    🟥 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  5m 00s
      🟥 nvcc               Pass:   0%/43  | Total:  7h 33m | Avg: 10m 32s | Max:  1h 08m
    🟥 cxx
      🟥 Clang14            Pass:   0%/4   | Total: 24m 17s | Avg:  6m 04s | Max:  6m 46s
      🟥 Clang15            Pass:   0%/2   | Total: 12m 08s | Avg:  6m 04s | Max:  6m 14s
      🟥 Clang16            Pass:   0%/2   | Total: 12m 27s | Avg:  6m 13s | Max:  6m 14s
      🟥 Clang17            Pass:   0%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 31s
      🟥 Clang18            Pass:   0%/7   | Total: 28m 06s | Avg:  4m 00s | Max:  6m 39s
      🟥 GCC7               Pass:   0%/2   | Total: 12m 12s | Avg:  6m 06s | Max:  6m 21s
      🟥 GCC8               Pass:   0%/1   | Total:  6m 40s | Avg:  6m 40s | Max:  6m 40s
      🟥 GCC9               Pass:   0%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 40s
      🟥 GCC10              Pass:   0%/2   | Total: 12m 34s | Avg:  6m 17s | Max:  6m 25s
      🟥 GCC11              Pass:   0%/2   | Total: 12m 45s | Avg:  6m 22s | Max:  6m 30s
      🟥 GCC12              Pass:   0%/2   | Total: 13m 38s | Avg:  6m 49s | Max:  7m 02s
      🟥 GCC13              Pass:   0%/10  | Total: 37m 37s | Avg:  3m 45s | Max:  7m 05s
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 57m | Avg: 58m 37s | Max: 58m 39s
      🟥 MSVC14.42          Pass:   0%/3   | Total:  2h 09m | Avg: 43m 18s | Max:  1h 08m
      🟥 NVHPC24.7          Pass:   0%/2   | Total: 18m 44s | Avg:  9m 22s | Max:  9m 43s
    🟥 cxx_family
      🟥 Clang              Pass:   0%/17  | Total:  1h 29m | Avg:  5m 16s | Max:  6m 46s
      🟥 GCC                Pass:   0%/21  | Total:  1h 47m | Avg:  5m 08s | Max:  7m 05s
      🟥 MSVC               Pass:   0%/5   | Total:  4h 07m | Avg: 49m 26s | Max:  1h 08m
      🟥 NVHPC              Pass:   0%/2   | Total: 18m 44s | Avg:  9m 22s | Max:  9m 43s
    🟥 gpu
      🟥 h100               Pass:   0%/2   | Total:  5m 00s | Avg:  2m 30s | Max:  5m 00s
      🟥 rtx2080            Pass:   0%/33  | Total:  6h 09m | Avg: 11m 12s | Max:  1h 01m
      🟥 rtx4090            Pass:   0%/10  | Total:  1h 28m | Avg:  8m 50s | Max:  1h 08m
    🟥 jobs
      🟥 Build              Pass:   0%/38  | Total:  7h 43m | Avg: 12m 11s | Max:  1h 08m
      🟥 TestCPU            Pass:   0%/3  
      🟥 TestGPU            Pass:   0%/4  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  5m 00s | Avg:  2m 30s | Max:  5m 00s
      🟥 90;90a;100         Pass:   0%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s
    🟥 std
      🟥 17                 Pass:   0%/20  | Total:  4h 47m | Avg: 14m 22s | Max:  1h 01m
      🟥 20                 Pass:   0%/23  | Total:  2h 49m | Avg:  7m 22s | Max:  1h 08m
    
  • 🟨 libcudacxx: Pass: 2%/43 | Total: 2h 44m | Avg: 3m 49s | Max: 20m 13s

    🟨 jobs
      🟥 Build              Pass:   0%/37  | Total:  2h 06m | Avg:  3m 25s | Max: 13m 09s
      🟥 NVRTC              Pass:   0%/2   | Total: 35m 52s | Avg: 17m 56s | Max: 20m 13s
      🟥 Test               Pass:   0%/3  
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s
    🟨 cpu
      🟨 amd64              Pass:   2%/41  | Total:  2h 40m | Avg:  3m 55s | Max: 20m 13s
      🟥 arm64              Pass:   0%/2   | Total:  3m 42s | Avg:  1m 51s | Max:  1m 53s
    🟨 ctk
      🟥 12.0               Pass:   0%/5   | Total: 19m 29s | Avg:  3m 53s | Max: 11m 02s
      🟥 12.5               Pass:   0%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 32s
      🟨 12.8               Pass:   2%/36  | Total:  2h 16m | Avg:  3m 46s | Max: 20m 13s
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  4m 44s | Avg:  2m 22s | Max:  2m 24s
      🟥 nvcc12.0           Pass:   0%/5   | Total: 19m 29s | Avg:  3m 53s | Max: 11m 02s
      🟥 nvcc12.5           Pass:   0%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 32s
      🟨 nvcc12.8           Pass:   2%/34  | Total:  2h 11m | Avg:  3m 51s | Max: 20m 13s
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  4m 44s | Avg:  2m 22s | Max:  2m 24s
      🟨 nvcc               Pass:   2%/41  | Total:  2h 39m | Avg:  3m 53s | Max: 20m 13s
    🟨 cxx
      🟥 Clang14            Pass:   0%/4   | Total:  9m 01s | Avg:  2m 15s | Max:  2m 18s
      🟥 Clang15            Pass:   0%/2   | Total:  4m 49s | Avg:  2m 24s | Max:  2m 30s
      🟥 Clang16            Pass:   0%/2   | Total:  4m 55s | Avg:  2m 27s | Max:  2m 28s
      🟥 Clang17            Pass:   0%/2   | Total:  4m 50s | Avg:  2m 25s | Max:  2m 31s
      🟥 Clang18            Pass:   0%/6   | Total: 11m 27s | Avg:  1m 54s | Max:  2m 25s
      🟥 GCC7               Pass:   0%/2   | Total:  4m 06s | Avg:  2m 03s | Max:  2m 04s
      🟥 GCC8               Pass:   0%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
      🟥 GCC9               Pass:   0%/2   | Total:  4m 06s | Avg:  2m 03s | Max:  2m 08s
      🟥 GCC10              Pass:   0%/2   | Total:  4m 30s | Avg:  2m 15s | Max:  2m 17s
      🟥 GCC11              Pass:   0%/2   | Total:  4m 09s | Avg:  2m 04s | Max:  2m 06s
      🟥 GCC12              Pass:   0%/2   | Total:  4m 26s | Avg:  2m 13s | Max:  2m 13s
      🟨 GCC13              Pass:  10%/10  | Total: 49m 02s | Avg:  4m 54s | Max: 20m 13s
      🟥 MSVC14.29          Pass:   0%/2   | Total: 24m 11s | Avg: 12m 05s | Max: 13m 09s
      🟥 MSVC14.42          Pass:   0%/2   | Total: 23m 44s | Avg: 11m 52s | Max: 11m 57s
      🟥 NVHPC24.7          Pass:   0%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 32s
    🟨 cxx_family
      🟥 Clang              Pass:   0%/16  | Total: 35m 02s | Avg:  2m 11s | Max:  2m 31s
      🟨 GCC                Pass:   4%/21  | Total:  1h 12m | Avg:  3m 27s | Max: 20m 13s
      🟥 MSVC               Pass:   0%/4   | Total: 47m 55s | Avg: 11m 58s | Max: 13m 09s
      🟥 NVHPC              Pass:   0%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 32s
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  2m 12s | Avg:  1m 06s | Max:  2m 12s
      🟨 rtx2080            Pass:   2%/41  | Total:  2h 42m | Avg:  3m 57s | Max: 20m 13s
    🟥 sm
      🟥 75                 Pass:   0%/2   | Total: 35m 52s | Avg: 17m 56s | Max: 20m 13s
      🟥 90                 Pass:   0%/2   | Total:  2m 12s | Avg:  1m 06s | Max:  2m 12s
      🟥 90;90a;100         Pass:   0%/1   | Total:  2m 19s | Avg:  2m 19s | Max:  2m 19s
    🟥 std
      🟥 17                 Pass:   0%/21  | Total:  1h 31m | Avg:  4m 22s | Max: 15m 39s
      🟥 20                 Pass:   0%/21  | Total:  1h 10m | Avg:  3m 21s | Max: 20m 13s
    
  • 🟥 cudax: Pass: 0%/22 | Total: 1h 05m | Avg: 2m 58s | Max: 10m 53s

    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/22  | Total:  1h 05m | Avg:  2m 58s | Max: 10m 53s
    🟥 cpu
      🟥 amd64              Pass:   0%/18  | Total: 57m 55s | Avg:  3m 13s | Max: 10m 53s
      🟥 arm64              Pass:   0%/4   | Total:  7m 31s | Avg:  1m 52s | Max:  1m 56s
    🟥 ctk
      🟥 12.0               Pass:   0%/1   | Total: 10m 53s | Avg: 10m 53s | Max: 10m 53s
      🟥 12.5               Pass:   0%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 36s
      🟥 12.8               Pass:   0%/19  | Total: 43m 30s | Avg:  2m 17s | Max: 10m 16s
    🟥 cudacxx
      🟥 nvcc12.0           Pass:   0%/1   | Total: 10m 53s | Avg: 10m 53s | Max: 10m 53s
      🟥 nvcc12.5           Pass:   0%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 36s
      🟥 nvcc12.8           Pass:   0%/19  | Total: 43m 30s | Avg:  2m 17s | Max: 10m 16s
    🟥 cxx
      🟥 Clang14            Pass:   0%/1   | Total:  2m 17s | Avg:  2m 17s | Max:  2m 17s
      🟥 Clang15            Pass:   0%/1   | Total:  2m 27s | Avg:  2m 27s | Max:  2m 27s
      🟥 Clang16            Pass:   0%/1   | Total:  2m 27s | Avg:  2m 27s | Max:  2m 27s
      🟥 Clang17            Pass:   0%/1   | Total:  2m 22s | Avg:  2m 22s | Max:  2m 22s
      🟥 Clang18            Pass:   0%/4   | Total:  6m 14s | Avg:  1m 33s | Max:  2m 24s
      🟥 GCC10              Pass:   0%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
      🟥 GCC11              Pass:   0%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
      🟥 GCC12              Pass:   0%/2   | Total:  2m 15s | Avg:  1m 07s | Max:  2m 15s
      🟥 GCC13              Pass:   0%/6   | Total: 10m 51s | Avg:  1m 48s | Max:  2m 29s
      🟥 MSVC14.39          Pass:   0%/1   | Total: 10m 53s | Avg: 10m 53s | Max: 10m 53s
      🟥 MSVC14.42          Pass:   0%/1   | Total: 10m 16s | Avg: 10m 16s | Max: 10m 16s
      🟥 NVHPC24.7          Pass:   0%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 36s
    🟥 cxx_family
      🟥 Clang              Pass:   0%/8   | Total: 15m 47s | Avg:  1m 58s | Max:  2m 27s
      🟥 GCC                Pass:   0%/10  | Total: 17m 27s | Avg:  1m 44s | Max:  2m 29s
      🟥 MSVC               Pass:   0%/2   | Total: 21m 09s | Avg: 10m 34s | Max: 10m 53s
      🟥 NVHPC              Pass:   0%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 36s
    🟥 gpu
      🟥 h100               Pass:   0%/2   | Total:  2m 26s | Avg:  1m 13s | Max:  2m 26s
      🟥 rtx2080            Pass:   0%/20  | Total:  1h 03m | Avg:  3m 09s | Max: 10m 53s
    🟥 jobs
      🟥 Build              Pass:   0%/19  | Total:  1h 05m | Avg:  3m 26s | Max: 10m 53s
      🟥 Test               Pass:   0%/3  
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total:  4m 55s | Avg:  1m 38s | Max:  2m 29s
      🟥 90a                Pass:   0%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
    🟥 std
      🟥 17                 Pass:   0%/4   | Total: 11m 52s | Avg:  2m 58s | Max:  5m 36s
      🟥 20                 Pass:   0%/18  | Total: 53m 34s | Avg:  2m 58s | Max: 10m 53s
    
  • 🟥 cccl_c_parallel: Pass: 0%/2 | Total: 2m 35s | Avg: 1m 17s | Max: 2m 35s

    🟥 cpu
      🟥 amd64              Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 ctk
      🟥 12.8               Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 cxx
      🟥 GCC13              Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/2   | Total:  2m 35s | Avg:  1m 17s | Max:  2m 35s
    🟥 jobs
      🟥 Build              Pass:   0%/1   | Total:  2m 35s | Avg:  2m 35s | Max:  2m 35s
      🟥 Test               Pass:   0%/1  
    
  • 🟥 python: Pass: 0%/1 | Total: 3m 28s | Avg: 3m 28s | Max: 3m 28s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 18h | Avg: 56m 28s | Max: 1h 19m | Hits: 30%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 16h | Avg: 56m 10s | Max:  1h 19m | Hits:  31%/51055 
      🟩 arm64              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m | Hits:  16%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 15m | Avg:  1h 03m | Max:  1h 12m | Hits:  15%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  11%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  1d 10h | Avg: 54m 50s | Max:  1h 19m | Hits:  33%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  14%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 15m | Avg:  1h 03m | Max:  1h 12m | Hits:  15%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  11%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 08h | Avg: 54m 18s | Max:  1h 19m | Hits:  34%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  14%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 16h | Avg: 56m 06s | Max:  1h 19m | Hits:  31%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 02m | Avg:  1h 00m | Max:  1h 04m | Hits:  16%/4868  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  16%/2430  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 24s | Max: 59m 50s | Hits:  16%/2430  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 09m | Hits:  16%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 03m | Avg: 51m 55s | Max:  1h 08m | Hits:  40%/8175  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  15%/2434  
      🟩 GCC8               Pass: 100%/1   | Total: 58m 07s | Avg: 58m 07s | Max: 58m 07s | Hits:  16%/1217  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 01m | Hits:  16%/2434  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 08m | Hits:  16%/2434  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  15%/2430  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m | Hits:  15%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 09m | Avg: 39m 01s | Max:  1h 19m | Hits:  61%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 16m | Hits:  12%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 15m | Hits:  12%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  11%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 22m | Avg: 57m 46s | Max:  1h 09m | Hits:  26%/20333 
      🟩 GCC                Pass: 100%/22  | Total: 18h 38m | Avg: 50m 49s | Max:  1h 19m | Hits:  38%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 59m | Avg:  1h 14m | Max:  1h 16m | Hits:  12%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  11%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 52s | Max: 29m 49s | Hits:  71%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 13h | Avg:  1h 05m | Max:  1h 19m | Hits:  15%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 06m | Avg: 30m 49s | Max:  1h 01m | Hits:  78%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 15h | Avg:  1h 04m | Max:  1h 19m | Hits:  15%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 51s | Avg: 20m 51s | Max: 20m 51s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 56s | Avg: 16m 56s | Max: 16m 56s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 08m | Avg: 22m 55s | Max: 23m 16s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 02m | Avg: 20m 58s | Max: 21m 33s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 52s | Max: 29m 49s | Hits:  71%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 19m | Avg:  1h 19m | Max:  1h 19m | Hits:  15%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 39m | Avg:  1h 04m | Max:  1h 16m | Hits:  15%/23535 
      🟩 20                 Pass: 100%/25  | Total: 20h 41m | Avg: 49m 40s | Max:  1h 19m | Hits:  42%/29950 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@fbusato
Copy link
Contributor

fbusato commented Feb 27, 2025

always_false is so common...we could even expose it...

@fbusato
Copy link
Contributor

fbusato commented Feb 27, 2025

I love to see always_fase and the refactoring of byteswap.
A minor concern is that the PR mixes different topics.

@fbusato
Copy link
Contributor

fbusato commented Feb 27, 2025

also, please note that __builtin_bswap32 and friend are available in GCC < 10 but they are not detected by _CCCL_BUILTIN_BSWAP32

@miscco
Copy link
Collaborator

miscco commented Feb 27, 2025

also, please note that __builtin_bswap32 and friend are available in GCC < 10 but they are not detected by _CCCL_BUILTIN_BSWAP32

GCC <10 does not support __has_builtin which is one of the reason that in builtin.h a ton of builtins is explicitly enabled for GCC

@davebayer davebayer requested a review from fbusato February 28, 2025 07:03
@miscco
Copy link
Collaborator

miscco commented Feb 28, 2025

/ok to test

1 similar comment
@miscco
Copy link
Collaborator

miscco commented Feb 28, 2025

/ok to test

Copy link
Contributor

🟩 CI finished in 1h 45m: Pass: 100%/158 | Total: 3d 19h | Avg: 34m 38s | Max: 1h 19m | Hits: 35%/249051
  • 🟩 cub: Pass: 100%/45 | Total: 1d 18h | Avg: 56m 22s | Max: 1h 19m | Hits: 30%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 16h | Avg: 55m 58s | Max:  1h 19m | Hits:  31%/51055 
      🟩 arm64              Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 06m | Hits:  16%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 16m | Avg:  1h 03m | Max:  1h 06m | Hits:  15%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 17m | Hits:  11%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  1d 10h | Avg: 54m 29s | Max:  1h 19m | Hits:  33%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m | Hits:  14%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 16m | Avg:  1h 03m | Max:  1h 06m | Hits:  15%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 17m | Hits:  11%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 08h | Avg: 53m 59s | Max:  1h 19m | Hits:  34%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m | Hits:  14%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 16h | Avg: 56m 03s | Max:  1h 19m | Hits:  31%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 02m | Avg:  1h 00m | Max:  1h 01m | Hits:  16%/4868  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m | Hits:  16%/2430  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 29s | Max: 59m 30s | Hits:  16%/2430  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m | Hits:  16%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 01m | Avg: 51m 37s | Max:  1h 06m | Hits:  40%/8175  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  15%/2434  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  16%/1217  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 04m | Hits:  16%/2434  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m | Hits:  16%/2434  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 08m | Hits:  15%/2430  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m | Hits:  15%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 00m | Avg: 38m 15s | Max:  1h 13m | Hits:  61%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 19m | Hits:  12%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 17m | Hits:  12%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 17m | Hits:  11%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 15m | Avg: 57m 21s | Max:  1h 07m | Hits:  26%/20333 
      🟩 GCC                Pass: 100%/22  | Total: 18h 33m | Avg: 50m 36s | Max:  1h 13m | Hits:  38%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 58m | Avg:  1h 14m | Max:  1h 19m | Hits:  12%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 17m | Hits:  11%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 42s | Max: 28m 19s | Hits:  71%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 12h | Avg:  1h 04m | Max:  1h 19m | Hits:  15%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 15m | Avg: 31m 53s | Max:  1h 04m | Hits:  78%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 15h | Avg:  1h 03m | Max:  1h 19m | Hits:  15%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 55s | Avg: 20m 55s | Max: 20m 55s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 12s | Avg: 17m 12s | Max: 17m 12s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 17s | Max: 24m 09s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 10s | Max: 22m 55s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 42s | Max: 28m 19s | Hits:  71%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 13m | Avg:  1h 13m | Max:  1h 13m | Hits:  15%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 22m | Avg:  1h 04m | Max:  1h 19m | Hits:  15%/23535 
      🟩 20                 Pass: 100%/25  | Total: 20h 54m | Avg: 50m 10s | Max:  1h 17m | Hits:  42%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 1d 00h | Avg: 33m 14s | Max: 1h 05m | Hits: 47%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 40m 10s | Avg: 20m 05s | Max: 28m 55s | Hits:  68%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 23h 55m | Avg: 33m 22s | Max:  1h 05m | Hits:  48%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 31s | Max: 32m 08s | Hits:  37%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 09m | Avg: 37m 51s | Max: 58m 16s | Hits:  40%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:   3%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 19h 38m | Avg: 31m 01s | Max:  1h 03m | Hits:  50%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 52s | Avg: 28m 26s | Max: 29m 37s | Hits:  38%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 09m | Avg: 37m 51s | Max: 58m 16s | Hits:  40%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:   3%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 18h 42m | Avg: 31m 10s | Max:  1h 03m | Hits:  51%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 52s | Avg: 28m 26s | Max: 29m 37s | Hits:  38%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 23h 59m | Avg: 33m 28s | Max:  1h 05m | Hits:  48%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 37s | Max: 32m 26s | Hits:  49%/7124  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 17s | Max: 34m 05s | Hits:  38%/3562  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 47s | Max: 33m 19s | Hits:  38%/3562  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 54s | Max: 34m 35s | Hits:  38%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 47m | Avg: 23m 55s | Max: 32m 55s | Hits:  59%/12467 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 44s | Max: 33m 18s | Hits:  53%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 35m 34s | Avg: 35m 34s | Max: 35m 34s | Hits:  37%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 16s | Max: 35m 34s | Hits:  53%/3564  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 01s | Max: 35m 33s | Hits:  37%/3564  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 18s | Max: 35m 26s | Hits:  37%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 27s | Max: 35m 00s | Hits:  37%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 45m | Avg: 22m 30s | Max: 33m 55s | Hits:  69%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 13s | Max: 58m 16s | Hits:  17%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 33m | Avg: 51m 09s | Max:  1h 03m | Hits:  26%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:   3%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 13m | Avg: 29m 03s | Max: 34m 35s | Hits:  49%/30277 
      🟩 GCC                Pass: 100%/21  | Total: 10h 04m | Avg: 28m 46s | Max: 35m 34s | Hits:  56%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 29m | Avg: 53m 59s | Max:  1h 03m | Hits:  22%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:   3%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 34m 37s | Avg: 17m 18s | Max: 23m 44s | Hits:  68%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 20h 25m | Avg: 37m 07s | Max:  1h 05m | Hits:  38%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 56m | Avg: 23m 38s | Max:  1h 03m | Hits:  73%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 23h 26m | Avg: 37m 01s | Max:  1h 05m | Hits:  38%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 31s | Avg: 15m 30s | Max: 30m 36s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 42m 59s | Avg: 10m 44s | Max: 11m 15s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 34m 37s | Avg: 17m 18s | Max: 23m 44s | Hits:  68%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 32m 02s | Avg: 32m 02s | Max: 32m 02s | Hits:  74%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 52m | Avg: 38m 37s | Max:  1h 05m | Hits:  37%/35611 
      🟩 20                 Pass: 100%/23  | Total: 11h 23m | Avg: 29m 43s | Max:  1h 03m | Hits:  54%/40961 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 17h 50m | Avg: 24m 54s | Max: 48m 21s | Hits: 28%/103748

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 17h 05m | Avg: 25m 00s | Max: 48m 21s | Hits:  28%/98057 
      🟩 arm64              Pass: 100%/2   | Total: 45m 08s | Avg: 22m 34s | Max: 22m 47s | Hits:  29%/5691  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 52m | Avg: 22m 34s | Max: 32m 12s | Hits:  29%/13764 
      🟩 12.5               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 29s | Max: 34m 44s | Hits:  28%/5636  
      🟩 12.8               Pass: 100%/36  | Total: 14h 48m | Avg: 24m 41s | Max: 48m 21s | Hits:  28%/84348 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 45m 36s | Avg: 22m 48s | Max: 24m 15s | Hits:  20%/5652  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 52m | Avg: 22m 34s | Max: 32m 12s | Hits:  29%/13764 
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 08m | Avg: 34m 29s | Max: 34m 44s | Hits:  28%/5636  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 14h 03m | Avg: 24m 48s | Max: 48m 21s | Hits:  29%/78696 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 45m 36s | Avg: 22m 48s | Max: 24m 15s | Hits:  20%/5652  
      🟩 nvcc               Pass: 100%/41  | Total: 17h 05m | Avg: 25m 00s | Max: 48m 21s | Hits:  29%/98096 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 25m | Avg: 21m 19s | Max: 23m 35s | Hits:  29%/11274 
      🟩 Clang15            Pass: 100%/2   | Total: 45m 58s | Avg: 22m 59s | Max: 24m 05s | Hits:  29%/5648  
      🟩 Clang16            Pass: 100%/2   | Total: 47m 14s | Avg: 23m 37s | Max: 24m 38s | Hits:  29%/5648  
      🟩 Clang17            Pass: 100%/2   | Total: 47m 38s | Avg: 23m 49s | Max: 24m 32s | Hits:  29%/5648  
      🟩 Clang18            Pass: 100%/6   | Total:  2h 40m | Avg: 26m 45s | Max: 46m 05s | Hits:  25%/14145 
      🟩 GCC7               Pass: 100%/2   | Total: 43m 54s | Avg: 21m 57s | Max: 23m 17s | Hits:  29%/5586  
      🟩 GCC8               Pass: 100%/1   | Total: 21m 23s | Avg: 21m 23s | Max: 21m 23s | Hits:  28%/2803  
      🟩 GCC9               Pass: 100%/2   | Total: 44m 31s | Avg: 22m 15s | Max: 23m 37s | Hits:  29%/5598  
      🟩 GCC10              Pass: 100%/2   | Total: 45m 14s | Avg: 22m 37s | Max: 24m 15s | Hits:  29%/5654  
      🟩 GCC11              Pass: 100%/2   | Total: 45m 11s | Avg: 22m 35s | Max: 23m 43s | Hits:  28%/5650  
      🟩 GCC12              Pass: 100%/2   | Total: 46m 47s | Avg: 23m 23s | Max: 24m 30s | Hits:  29%/5650  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 37m | Avg: 21m 42s | Max: 48m 21s | Hits:  29%/14406 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 10m | Avg: 35m 28s | Max: 38m 44s | Hits:  30%/5120  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 20m | Avg: 40m 00s | Max: 41m 17s | Hits:  30%/5282  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 08m | Avg: 34m 29s | Max: 34m 44s | Hits:  28%/5636  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  6h 26m | Avg: 24m 10s | Max: 46m 05s | Hits:  27%/42363 
      🟩 GCC                Pass: 100%/21  | Total:  7h 44m | Avg: 22m 05s | Max: 48m 21s | Hits:  29%/45347 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 30m | Avg: 37m 44s | Max: 41m 17s | Hits:  30%/10402 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 29s | Max: 34m 44s | Hits:  28%/5636  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 26s | Avg: 14m 43s | Max: 17m 39s | Hits:  28%/2935  
      🟩 rtx2080            Pass: 100%/41  | Total: 17h 21m | Avg: 25m 23s | Max: 48m 21s | Hits:  28%/100813
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 15h 27m | Avg: 25m 04s | Max: 41m 17s | Hits:  28%/103708
      🟩 NVRTC              Pass: 100%/2   | Total: 34m 16s | Avg: 17m 08s | Max: 19m 09s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total:  1h 46m | Avg: 35m 24s | Max: 48m 21s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 34m 16s | Avg: 17m 08s | Max: 19m 09s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 29m 26s | Avg: 14m 43s | Max: 17m 39s | Hits:  28%/2935  
      🟩 90;90a;100         Pass: 100%/1   | Total: 33m 37s | Avg: 33m 37s | Max: 33m 37s | Hits:  29%/2935  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  8h 29m | Avg: 24m 15s | Max: 38m 44s | Hits:  28%/55320 
      🟩 20                 Pass: 100%/21  | Total:  9h 19m | Avg: 26m 37s | Max: 48m 21s | Hits:  28%/48428 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 5h 03m | Avg: 13m 48s | Max: 18m 43s | Hits: 40%/11374

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  4h 05m | Avg: 13m 39s | Max: 18m 43s | Hits:  43%/9122  
      🟩 arm64              Pass: 100%/4   | Total: 57m 52s | Avg: 14m 28s | Max: 15m 56s | Hits:  28%/2252  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  47%/262   
      🟩 12.5               Pass: 100%/2   | Total: 19m 05s | Avg:  9m 32s | Max:  9m 37s | Hits:  39%/712   
      🟩 12.8               Pass: 100%/19  | Total:  4h 33m | Avg: 14m 23s | Max: 18m 43s | Hits:  40%/10400 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  47%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 05s | Avg:  9m 32s | Max:  9m 37s | Hits:  39%/712   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  4h 33m | Avg: 14m 23s | Max: 18m 43s | Hits:  40%/10400 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  5h 03m | Avg: 13m 48s | Max: 18m 43s | Hits:  40%/11374 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 15m 34s | Avg: 15m 34s | Max: 15m 34s | Hits:  29%/565   
      🟩 Clang15            Pass: 100%/1   | Total: 16m 12s | Avg: 16m 12s | Max: 16m 12s | Hits:  28%/563   
      🟩 Clang16            Pass: 100%/1   | Total: 16m 24s | Avg: 16m 24s | Max: 16m 24s | Hits:  28%/563   
      🟩 Clang17            Pass: 100%/1   | Total: 16m 13s | Avg: 16m 13s | Max: 16m 13s | Hits:  28%/563   
      🟩 Clang18            Pass: 100%/4   | Total: 55m 07s | Avg: 13m 46s | Max: 15m 41s | Hits:  46%/2252  
      🟩 GCC10              Pass: 100%/1   | Total: 16m 51s | Avg: 16m 51s | Max: 16m 51s | Hits:  28%/565   
      🟩 GCC11              Pass: 100%/1   | Total: 18m 43s | Avg: 18m 43s | Max: 18m 43s | Hits:  28%/563   
      🟩 GCC12              Pass: 100%/2   | Total: 29m 27s | Avg: 14m 43s | Max: 17m 09s | Hits:  64%/1126  
      🟩 GCC13              Pass: 100%/6   | Total:  1h 18m | Avg: 13m 02s | Max: 15m 56s | Hits:  40%/3378  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  47%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 36s | Avg: 10m 36s | Max: 10m 36s | Hits:  47%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 05s | Avg:  9m 32s | Max:  9m 37s | Hits:  39%/712   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 59m | Avg: 14m 56s | Max: 16m 24s | Hits:  37%/4506  
      🟩 GCC                Pass: 100%/10  | Total:  2h 23m | Avg: 14m 19s | Max: 18m 43s | Hits:  42%/5632  
      🟩 MSVC               Pass: 100%/2   | Total: 21m 49s | Avg: 10m 54s | Max: 11m 13s | Hits:  47%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 05s | Avg:  9m 32s | Max:  9m 37s | Hits:  39%/712   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 25m 00s | Avg: 12m 30s | Max: 13m 32s | Hits:  64%/1126  
      🟩 rtx2080            Pass: 100%/20  | Total:  4h 38m | Avg: 13m 56s | Max: 18m 43s | Hits:  38%/10248 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  4h 28m | Avg: 14m 07s | Max: 18m 43s | Hits:  30%/9685  
      🟩 Test               Pass: 100%/3   | Total: 35m 20s | Avg: 11m 46s | Max: 12m 18s | Hits:  99%/1689  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 36m 09s | Avg: 12m 03s | Max: 13m 32s | Hits:  52%/1689  
      🟩 90a                Pass: 100%/1   | Total: 12m 07s | Avg: 12m 07s | Max: 12m 07s | Hits:  28%/563   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 48m 06s | Avg: 12m 01s | Max: 14m 04s | Hits:  30%/2045  
      🟩 20                 Pass: 100%/18  | Total:  4h 15m | Avg: 14m 11s | Max: 18m 43s | Hits:  43%/9329  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 47s | Avg: 7m 53s | Max: 12m 59s | Hits: 97%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 47s | Avg:  7m 53s | Max: 12m 59s | Hits:  97%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 48s | Avg:  2m 48s | Max:  2m 48s | Hits:  95%/154   
      🟩 Test               Pass: 100%/1   | Total: 12m 59s | Avg: 12m 59s | Max: 12m 59s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 51m 19s | Avg: 51m 19s | Max: 51m 19s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@miscco
Copy link
Collaborator

miscco commented Feb 28, 2025

/ok to test

Copy link
Contributor

🟨 CI finished in 1h 06m: Pass: 98%/158 | Total: 1d 06h | Avg: 11m 36s | Max: 53m 05s | Hits: 77%/249031
  • 🟨 libcudacxx: Pass: 95%/43 | Total: 12h 33m | Avg: 17m 31s | Max: 51m 39s | Hits: 53%/103728

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/41  | Total: 11h 49m | Avg: 17m 18s | Max: 51m 39s | Hits:  54%/98037 
      🟩 arm64              Pass: 100%/2   | Total: 43m 48s | Avg: 21m 54s | Max: 22m 06s | Hits:  36%/5691  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  1h 25m | Avg: 17m 09s | Max: 23m 18s | Hits:  49%/13764 
      🟩 12.5               Pass: 100%/2   | Total: 39m 53s | Avg: 19m 56s | Max: 30m 54s | Hits:  67%/5636  
      🔍 12.8               Pass:  94%/36  | Total: 10h 27m | Avg: 17m 26s | Max: 51m 39s | Hits:  52%/84328 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 40m 51s | Avg: 20m 25s | Max: 21m 25s | Hits:  27%/5652  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 25m | Avg: 17m 09s | Max: 23m 18s | Hits:  49%/13764 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 39m 53s | Avg: 19m 56s | Max: 30m 54s | Hits:  67%/5636  
      🔍 nvcc12.8           Pass:  94%/34  | Total:  9h 46m | Avg: 17m 15s | Max: 51m 39s | Hits:  54%/78676 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 40m 51s | Avg: 20m 25s | Max: 21m 25s | Hits:  27%/5652  
      🔍 nvcc               Pass:  95%/41  | Total: 11h 52m | Avg: 17m 22s | Max: 51m 39s | Hits:  54%/98076 
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 16m 01s | Avg:  8m 00s | Max: 11m 53s | Hits:  99%/2935  
      🔍 rtx2080            Pass:  95%/41  | Total: 12h 17m | Avg: 17m 59s | Max: 51m 39s | Hits:  51%/100793
    🔍 sm: 75 🔍
      🔍 75                 Pass:  50%/2   | Total: 33m 34s | Avg: 16m 47s | Max: 17m 32s | Hits:  90%/20    
      🟩 90                 Pass: 100%/2   | Total: 16m 01s | Avg:  8m 00s | Max: 11m 53s | Hits:  99%/2935  
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s | Hits:  99%/2935  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/21  | Total:  6h 28m | Avg: 18m 29s | Max: 30m 54s | Hits:  48%/55320 
      🔍 20                 Pass:  90%/21  | Total:  6h 02m | Avg: 17m 16s | Max: 51m 39s | Hits:  58%/48408 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 02m | Avg: 15m 38s | Max: 20m 45s | Hits:  51%/11274 
      🟩 Clang15            Pass: 100%/2   | Total: 46m 46s | Avg: 23m 23s | Max: 25m 05s | Hits:  36%/5648  
      🟩 Clang16            Pass: 100%/2   | Total: 45m 54s | Avg: 22m 57s | Max: 23m 53s | Hits:  36%/5648  
      🟩 Clang17            Pass: 100%/2   | Total: 46m 08s | Avg: 23m 04s | Max: 24m 20s | Hits:  36%/5648  
      🟨 Clang18            Pass:  83%/6   | Total:  1h 29m | Avg: 14m 56s | Max: 22m 16s | Hits:  45%/14145 
      🟩 GCC7               Pass: 100%/2   | Total: 24m 11s | Avg: 12m 05s | Max: 20m 26s | Hits:  67%/5586  
      🟩 GCC8               Pass: 100%/1   | Total: 20m 16s | Avg: 20m 16s | Max: 20m 16s | Hits:  35%/2803  
      🟩 GCC9               Pass: 100%/2   | Total: 44m 28s | Avg: 22m 14s | Max: 23m 18s | Hits:  36%/5598  
      🟩 GCC10              Pass: 100%/2   | Total: 26m 33s | Avg: 13m 16s | Max: 22m 31s | Hits:  67%/5654  
      🟩 GCC11              Pass: 100%/2   | Total: 24m 01s | Avg: 12m 00s | Max: 19m 59s | Hits:  67%/5650  
      🟩 GCC12              Pass: 100%/2   | Total: 27m 22s | Avg: 13m 41s | Max: 23m 05s | Hits:  67%/5650  
      🟨 GCC13              Pass:  90%/10  | Total:  2h 35m | Avg: 15m 31s | Max: 51m 39s | Hits:  74%/14386 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 48m 19s | Avg: 24m 09s | Max: 25m 01s | Hits:  37%/5120  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 52m 18s | Avg: 26m 09s | Max: 27m 37s | Hits:  37%/5282  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 39m 53s | Avg: 19m 56s | Max: 30m 54s | Hits:  67%/5636  
    🟨 cxx_family
      🟨 Clang              Pass:  93%/16  | Total:  4h 51m | Avg: 18m 11s | Max: 25m 05s | Hits:  43%/42363 
      🟨 GCC                Pass:  95%/21  | Total:  5h 22m | Avg: 15m 20s | Max: 51m 39s | Hits:  63%/45327 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 40m | Avg: 25m 09s | Max: 27m 37s | Hits:  37%/10402 
      🟩 NVHPC              Pass: 100%/2   | Total: 39m 53s | Avg: 19m 56s | Max: 30m 54s | Hits:  67%/5636  
    🟨 jobs
      🟩 Build              Pass: 100%/37  | Total: 10h 54m | Avg: 17m 40s | Max: 30m 54s | Hits:  53%/103708
      🟨 NVRTC              Pass:  50%/2   | Total: 33m 34s | Avg: 16m 47s | Max: 17m 32s | Hits:  90%/20    
      🟨 Test               Pass:  66%/3   | Total:  1h 03m | Avg: 21m 10s | Max: 51m 39s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
    
  • 🟩 cub: Pass: 100%/45 | Total: 8h 19m | Avg: 11m 06s | Max: 30m 10s | Hits: 93%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 08m | Avg: 11m 21s | Max: 30m 10s | Hits:  92%/51055 
      🟩 arm64              Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  5m 47s | Hits:  99%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 50m 17s | Avg: 10m 03s | Max: 27m 20s | Hits:  85%/5908  
      🟩 12.5               Pass: 100%/2   | Total: 19m 50s | Avg:  9m 55s | Max: 10m 08s | Hits:  98%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  7h 09m | Avg: 11m 18s | Max: 30m 10s | Hits:  94%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 10s | Avg:  4m 35s | Max:  4m 38s | Hits: 100%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 50m 17s | Avg: 10m 03s | Max: 27m 20s | Hits:  85%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 50s | Avg:  9m 55s | Max: 10m 08s | Hits:  98%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  7h 00m | Avg: 11m 40s | Max: 30m 10s | Hits:  93%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 10s | Avg:  4m 35s | Max:  4m 38s | Hits: 100%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 10m | Avg: 11m 24s | Max: 30m 10s | Hits:  92%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 59s | Avg:  5m 59s | Max:  6m 27s | Hits: 100%/4868  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 46s | Avg:  6m 23s | Max:  6m 36s | Hits: 100%/2430  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 30s | Avg:  6m 15s | Max:  6m 16s | Hits: 100%/2430  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 25s | Hits: 100%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 12m | Avg: 10m 23s | Max: 24m 18s | Hits:  99%/8175  
      🟩 GCC7               Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  6m 15s | Hits:  99%/2434  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 22s | Avg:  6m 22s | Max:  6m 22s | Hits:  99%/1217  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 50s | Hits:  99%/2434  
      🟩 GCC10              Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max:  6m 49s | Hits:  99%/2434  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 28s | Avg:  6m 44s | Max:  7m 10s | Hits:  99%/2430  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 34s | Avg:  6m 47s | Max:  7m 07s | Hits:  99%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 40m | Avg: 14m 32s | Max: 24m 14s | Hits:  99%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 44s | Avg: 27m 22s | Max: 27m 24s | Hits:  15%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 58m 58s | Avg: 29m 29s | Max: 30m 10s | Hits:  15%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 50s | Avg:  9m 55s | Max: 10m 08s | Hits:  98%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 14m | Avg:  7m 54s | Max: 24m 18s | Hits:  99%/20333 
      🟩 GCC                Pass: 100%/22  | Total:  3h 51m | Avg: 10m 31s | Max: 24m 14s | Hits:  99%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 25s | Max: 30m 10s | Hits:  15%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 50s | Avg:  9m 55s | Max: 10m 08s | Hits:  98%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 49m 53s | Avg: 16m 37s | Max: 23m 28s | Hits:  99%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 06m | Avg:  9m 01s | Max: 30m 10s | Hits:  91%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 23m | Avg: 17m 52s | Max: 24m 18s | Hits:  99%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 23m | Avg:  8m 45s | Max: 30m 10s | Hits:  91%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 07s | Avg: 21m 07s | Max: 21m 07s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 14s | Avg: 17m 14s | Max: 17m 14s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 24m 18s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 46s | Max: 21m 57s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 49m 53s | Avg: 16m 37s | Max: 23m 28s | Hits:  99%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 45s | Avg:  6m 45s | Max:  6m 45s | Hits:  99%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 13m | Avg:  9m 41s | Max: 28m 48s | Hits:  88%/23535 
      🟩 20                 Pass: 100%/25  | Total:  5h 05m | Avg: 12m 14s | Max: 30m 10s | Hits:  96%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 35m | Avg: 8m 47s | Max: 31m 40s | Hits: 96%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 17s | Avg:  8m 38s | Max: 11m 12s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 25m | Avg:  8m 57s | Max: 31m 40s | Hits:  96%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  5m 26s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 42m 03s | Avg:  8m 24s | Max: 22m 13s | Hits:  94%/8901  
      🟩 12.5               Pass: 100%/2   | Total: 29m 12s | Avg: 14m 36s | Max: 14m 54s | Hits:  99%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 24m | Avg:  8m 31s | Max: 31m 40s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 37s | Avg:  5m 18s | Max:  5m 27s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 42m 03s | Avg:  8m 24s | Max: 22m 13s | Hits:  94%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 12s | Avg: 14m 36s | Max: 14m 54s | Hits:  99%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 13m | Avg:  8m 42s | Max: 31m 40s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 37s | Avg:  5m 18s | Max:  5m 27s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 24m | Avg:  8m 56s | Max: 31m 40s | Hits:  96%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 11s | Avg:  5m 02s | Max:  5m 22s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 52s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 56s | Avg:  5m 28s | Max:  5m 33s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 07s | Avg:  5m 33s | Max:  5m 34s | Hits: 100%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 44m 26s | Avg:  6m 20s | Max: 10m 13s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 20s | Avg:  5m 10s | Max:  5m 23s | Hits:  99%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 43s | Avg:  5m 21s | Max:  5m 25s | Hits:  99%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 11s | Avg:  5m 35s | Max:  5m 38s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 02s | Avg:  6m 01s | Max:  6m 04s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 43s | Avg:  5m 51s | Max:  5m 53s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 17m | Avg:  7m 47s | Max: 11m 43s | Hits:  99%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 28s | Avg: 23m 14s | Max: 24m 15s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 30s | Max: 31m 40s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 12s | Avg: 14m 36s | Max: 14m 54s | Hits:  99%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 38m | Avg:  5m 46s | Max: 10m 13s | Hits: 100%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  2h 19m | Avg:  6m 37s | Max: 11m 43s | Hits:  99%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 08m | Avg: 25m 47s | Max: 31m 40s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 12s | Avg: 14m 36s | Max: 14m 54s | Hits:  99%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 20s | Avg:  8m 10s | Max: 11m 43s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 13m | Avg:  7m 40s | Max: 24m 42s | Hits:  97%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 05m | Avg: 12m 33s | Max: 31m 40s | Hits:  94%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 02m | Avg:  7m 57s | Max: 26m 08s | Hits:  96%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 09s | Avg: 16m 03s | Max: 31m 40s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 34s | Avg: 11m 08s | Max: 11m 43s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 20s | Avg:  8m 10s | Max: 11m 43s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 53m | Avg:  8m 41s | Max: 24m 42s | Hits:  95%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 24m | Avg:  8m 52s | Max: 31m 40s | Hits:  97%/40961 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 1h 56m | Avg: 5m 17s | Max: 14m 17s | Hits: 97%/11374

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 45m | Avg:  5m 50s | Max: 14m 17s | Hits:  97%/9122  
      🟩 arm64              Pass: 100%/4   | Total: 11m 15s | Avg:  2m 48s | Max:  2m 51s | Hits:  99%/2252  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 21s | Avg:  9m 21s | Max:  9m 21s | Hits:  61%/262   
      🟩 12.5               Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  96%/712   
      🟩 12.8               Pass: 100%/19  | Total:  1h 35m | Avg:  5m 00s | Max: 14m 17s | Hits:  98%/10400 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 21s | Avg:  9m 21s | Max:  9m 21s | Hits:  61%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  96%/712   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 35m | Avg:  5m 00s | Max: 14m 17s | Hits:  98%/10400 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  1h 56m | Avg:  5m 17s | Max: 14m 17s | Hits:  97%/11374 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 25s | Avg:  3m 25s | Max:  3m 25s | Hits: 100%/565   
      🟩 Clang15            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s | Hits: 100%/563   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s | Hits: 100%/563   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s | Hits: 100%/563   
      🟩 Clang18            Pass: 100%/4   | Total: 20m 14s | Avg:  5m 03s | Max: 11m 21s | Hits: 100%/2252  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s | Hits:  99%/565   
      🟩 GCC11              Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s | Hits:  99%/563   
      🟩 GCC12              Pass: 100%/2   | Total: 15m 50s | Avg:  7m 55s | Max: 12m 31s | Hits:  99%/1126  
      🟩 GCC13              Pass: 100%/6   | Total: 29m 24s | Avg:  4m 54s | Max: 14m 17s | Hits:  99%/3378  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 21s | Avg:  9m 21s | Max:  9m 21s | Hits:  61%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 23s | Avg:  9m 23s | Max:  9m 23s | Hits:  61%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  96%/712   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 33m 39s | Avg:  4m 12s | Max: 11m 21s | Hits: 100%/4506  
      🟩 GCC                Pass: 100%/10  | Total: 52m 08s | Avg:  5m 12s | Max: 14m 17s | Hits:  99%/5632  
      🟩 MSVC               Pass: 100%/2   | Total: 18m 44s | Avg:  9m 22s | Max:  9m 23s | Hits:  61%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  96%/712   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 25s | Avg:  8m 42s | Max: 14m 17s | Hits:  99%/1126  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 39m | Avg:  4m 57s | Max: 12m 31s | Hits:  97%/10248 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 18m | Avg:  4m 07s | Max:  9m 23s | Hits:  97%/9685  
      🟩 Test               Pass: 100%/3   | Total: 38m 09s | Avg: 12m 43s | Max: 14m 17s | Hits:  99%/1689  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 31s | Avg:  6m 50s | Max: 14m 17s | Hits:  99%/1689  
      🟩 90a                Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s | Hits:  99%/563   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 48s | Avg:  3m 42s | Max:  6m 00s | Hits:  99%/2045  
      🟩 20                 Pass: 100%/18  | Total:  1h 41m | Avg:  5m 39s | Max: 14m 17s | Hits:  97%/9329  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 37s | Avg: 7m 48s | Max: 13m 23s | Hits: 98%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 13m 23s | Hits:  98%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 14s | Avg:  2m 14s | Max:  2m 14s | Hits:  98%/154   
      🟩 Test               Pass: 100%/1   | Total: 13m 23s | Avg: 13m 23s | Max: 13m 23s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 53m 05s | Avg: 53m 05s | Max: 53m 05s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

asm("mov.b64 {%0, %1}, %2;" : "=r"(__hi), "=r"(__lo) : "l"(__val));
asm("prmt.b32 %0, %0, 0, 0x0123;" : "+r"(__hi));
asm("prmt.b32 %0, %0, 0, 0x0123;" : "+r"(__lo));
_CCCL_NODISCARD _CCCL_HIDE_FROM_ABI _CCCL_DEVICE uint16_t __byteswap_impl_device(uint16_t __val) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about to use if constexpr instead of function overloadings?

}
return __impl_recursive<uint16_t>(__val);
#endif // !_CCCL_BUILTIN_BSWAP32
# if __cccl_ptx_isa >= 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm start thinking that __cccl_ptx_isa guard is not needed even here

# if _CCCL_COMPILER(MSVC)
NV_IF_TARGET(NV_IS_HOST, return _byteswap_ulong(__val);)
NV_IF_TARGET(NV_IS_HOST, return _byteswap_ulong(__val);)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NV_IF_TARGET(NV_IS_HOST, return _byteswap_ulong(__val);)
NV_IF_TARGET(NV_IS_HOST, return ::_byteswap_ulong(__val);)

}
return __result;
__result <<= __shift;
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(_CUDA_VSTD::numeric_limits<uint8_t>::max());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(_CUDA_VSTD::numeric_limits<uint8_t>::max());
__result |= (__val >> (__i * __shift)) & _Tp{numeric_limits<uint8_t>::max()};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or

Suggested change
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(_CUDA_VSTD::numeric_limits<uint8_t>::max());
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(~_Tp{0});

constexpr auto __shift = numeric_limits<uint8_t>::digits;

_Tp __result{};
for (size_t __i{}; __i < sizeof(__val); ++__i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we would need a portable #pragma unroll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants