Add `cuda::is_floating_point` supporting half and bfloat #3379

bernhardmgruber · 2025-01-14T09:47:19Z

This PR adds a new header cuda/type_traits which brings in a new trait cuda::is_floating_point<T> that defaults to cuda::std::is_floating_point<T> but also provides specializations for half and bfloat.

davebayer · 2025-01-14T10:52:08Z

Maybe it would be worth it go through the code and replace all uses of __is_extended_floating_point_v<T> in is_floating_point_v<T> && __is_extended_floating_point_v<T>

miscco

Unfortunately that is currently nothing we intend to do yet

We are quite strict with what we allow for cuda::std and it being strictly standard compliant

Changing that type trait would definitely break standard behavior, so we can only enable this in the context of adding support for C++26 extended floating point types

wmaxey · 2025-01-15T22:56:14Z

Unfortunately that is currently nothing we intend to do yet

We are quite strict with what we allow for cuda::std and it being strictly standard compliant

Changing that type trait would definitely break standard behavior, so we can only enable this in the context of adding support for C++26 extended floating point types

We can expose it as cuda:: pending standardization.

miscco · 2025-01-16T07:20:50Z

We can expose it as cuda:: pending standardization.

I mean they are standardized, just under a different name like float16_t

jrhemstad · 2025-01-16T13:45:09Z

I agree, I think we should add this as a cuda:: extension. The existing library types are going to exist indefinitely, so we should still provide this machinery, just not part of cuda::std::.

libcudacxx/include/cuda/__type_traits/is_floating_point.h

github-actions · 2025-01-21T13:23:57Z

🟨 CI finished in 1h 57m: Pass: 99%/144 | Total: 1d 23h | Avg: 19m 40s | Max: 1h 04m | Hits: 509%/25812

🟨 libcudacxx: Pass: 97%/46 | Total: 8h 38m | Avg: 11m 16s | Max: 35m 12s | Hits: 682%/12570

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  97%/44  | Total:  8h 28m | Avg: 11m 33s | Max: 35m 12s | Hits: 682%/12570 
  🟩 arm64              Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  6m 40s
🔍 ctk: 12.6 🔍
  🟩 12.0               Pass: 100%/8   | Total:  1h 13m | Avg:  9m 12s | Max: 22m 45s | Hits: 682%/4906  
  🟩 12.5               Pass: 100%/2   | Total: 47m 04s | Avg: 23m 32s | Max: 35m 12s
  🔍 12.6               Pass:  97%/36  | Total:  6h 38m | Avg: 11m 03s | Max: 28m 23s | Hits: 682%/7664  
🔍 cudacxx: nvcc12.6 🔍
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 05m | Avg: 16m 17s | Max: 21m 16s
  🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 13m | Avg:  9m 12s | Max: 22m 45s | Hits: 682%/4906  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 47m 04s | Avg: 23m 32s | Max: 35m 12s
  🔍 nvcc12.6           Pass:  96%/32  | Total:  5h 32m | Avg: 10m 24s | Max: 28m 23s | Hits: 682%/7664  
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 05m | Avg: 16m 17s | Max: 21m 16s
  🔍 nvcc               Pass:  97%/42  | Total:  7h 33m | Avg: 10m 47s | Max: 35m 12s | Hits: 682%/12570 
🔍 cxx: GCC13 🔍
  🟩 Clang14            Pass: 100%/6   | Total: 52m 32s | Avg:  8m 45s | Max: 12m 39s
  🟩 Clang15            Pass: 100%/1   | Total:  8m 06s | Avg:  8m 06s | Max:  8m 06s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 11s | Avg:  4m 11s | Max:  4m 11s
  🟩 Clang17            Pass: 100%/1   | Total:  8m 25s | Avg:  8m 25s | Max:  8m 25s
  🟩 Clang18            Pass: 100%/8   | Total:  1h 39m | Avg: 12m 28s | Max: 21m 16s
  🟩 GCC7               Pass: 100%/5   | Total: 16m 46s | Avg:  3m 21s | Max:  3m 49s
  🟩 GCC8               Pass: 100%/1   | Total:  3m 44s | Avg:  3m 44s | Max:  3m 44s
  🟩 GCC9               Pass: 100%/3   | Total:  9m 40s | Avg:  3m 13s | Max:  3m 39s
  🟩 GCC10              Pass: 100%/1   | Total:  4m 03s | Avg:  4m 03s | Max:  4m 03s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 35s | Avg:  3m 35s | Max:  3m 35s
  🟩 GCC12              Pass: 100%/1   | Total:  3m 56s | Avg:  3m 56s | Max:  3m 56s
  🔍 GCC13              Pass:  90%/10  | Total:  2h 16m | Avg: 13m 37s | Max: 28m 23s
  🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 08m | Avg: 22m 40s | Max: 24m 04s | Hits: 682%/7410  
  🟩 MSVC14.39          Pass: 100%/2   | Total: 52m 42s | Avg: 26m 21s | Max: 28m 22s | Hits: 682%/5160  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 47m 04s | Avg: 23m 32s | Max: 35m 12s
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/17  | Total:  2h 53m | Avg: 10m 10s | Max: 21m 16s
  🔍 GCC                Pass:  95%/22  | Total:  2h 57m | Avg:  8m 05s | Max: 28m 23s
  🟩 MSVC               Pass: 100%/5   | Total:  2h 00m | Avg: 24m 08s | Max: 28m 22s | Hits: 682%/12570 
  🟩 NVHPC              Pass: 100%/2   | Total: 47m 04s | Avg: 23m 32s | Max: 35m 12s
🔍 jobs: Test 🔍
  🟩 Build              Pass: 100%/39  | Total:  6h 20m | Avg:  9m 45s | Max: 35m 12s | Hits: 682%/12570 
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 43m | Avg: 25m 46s | Max: 28m 23s
  🔍 Test               Pass:  50%/2   | Total: 32m 54s | Avg: 16m 27s | Max: 16m 32s
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s
🔍 std: 20 🔍
  🟩 11                 Pass: 100%/6   | Total: 56m 26s | Avg:  9m 24s | Max: 23m 04s
  🟩 14                 Pass: 100%/4   | Total: 58m 39s | Avg: 14m 39s | Max: 27m 21s | Hits: 682%/2412  
  🟩 17                 Pass: 100%/14  | Total:  2h 45m | Avg: 11m 48s | Max: 28m 23s | Hits: 682%/7502  
  🔍 20                 Pass:  95%/21  | Total:  3h 56m | Avg: 11m 15s | Max: 35m 12s | Hits: 681%/2656  
🟨 gpu
  🟨 v100               Pass:  97%/46  | Total:  8h 38m | Avg: 11m 16s | Max: 35m 12s | Hits: 682%/12570 
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 13m 48s | Avg: 13m 48s | Max: 13m 48s
  🟩 90a                Pass: 100%/2   | Total: 16m 15s | Avg:  8m 07s | Max: 12m 29s

🟩 cub: Pass: 100%/38 | Total: 1d 00h | Avg: 39m 04s | Max: 1h 04m | Hits: 433%/3540

🟩 cpu
  🟩 amd64              Pass: 100%/36  | Total: 23h 01m | Avg: 38m 21s | Max:  1h 04m | Hits: 433%/3540  
  🟩 arm64              Pass: 100%/2   | Total:  1h 43m | Avg: 51m 51s | Max: 55m 51s
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total:  3h 27m | Avg: 41m 29s | Max: 58m 08s | Hits: 443%/885   
  🟩 12.5               Pass: 100%/2   | Total:  1h 29m | Avg: 44m 57s | Max: 45m 40s
  🟩 12.6               Pass: 100%/31  | Total: 19h 47m | Avg: 38m 18s | Max:  1h 04m | Hits: 430%/2655  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 43m | Avg: 51m 55s | Max: 53m 45s
  🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 27m | Avg: 41m 29s | Max: 58m 08s | Hits: 443%/885   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 29m | Avg: 44m 57s | Max: 45m 40s
  🟩 nvcc12.6           Pass: 100%/29  | Total: 18h 03m | Avg: 37m 21s | Max:  1h 04m | Hits: 430%/2655  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 55s | Max: 53m 45s
  🟩 nvcc               Pass: 100%/36  | Total: 23h 00m | Avg: 38m 21s | Max:  1h 04m | Hits: 433%/3540  
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total:  2h 36m | Avg: 39m 08s | Max: 41m 15s
  🟩 Clang15            Pass: 100%/1   | Total: 38m 22s | Avg: 38m 22s | Max: 38m 22s
  🟩 Clang16            Pass: 100%/1   | Total: 36m 47s | Avg: 36m 47s | Max: 36m 47s
  🟩 Clang17            Pass: 100%/1   | Total: 37m 18s | Avg: 37m 18s | Max: 37m 18s
  🟩 Clang18            Pass: 100%/7   | Total:  5h 05m | Avg: 43m 42s | Max: 53m 45s
  🟩 GCC7               Pass: 100%/2   | Total:  1h 15m | Avg: 37m 48s | Max: 38m 24s
  🟩 GCC8               Pass: 100%/1   | Total: 51m 47s | Avg: 51m 47s | Max: 51m 47s
  🟩 GCC9               Pass: 100%/2   | Total:  1h 14m | Avg: 37m 07s | Max: 37m 28s
  🟩 GCC10              Pass: 100%/1   | Total: 39m 08s | Avg: 39m 08s | Max: 39m 08s
  🟩 GCC11              Pass: 100%/1   | Total: 37m 09s | Avg: 37m 09s | Max: 37m 09s
  🟩 GCC12              Pass: 100%/3   | Total:  1h 02m | Avg: 20m 45s | Max: 38m 29s
  🟩 GCC13              Pass: 100%/8   | Total:  3h 53m | Avg: 29m 14s | Max: 55m 51s
  🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 04m | Hits: 439%/1770  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m | Hits: 427%/1770  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 29m | Avg: 44m 57s | Max: 45m 40s
🟩 cxx_family
  🟩 Clang              Pass: 100%/14  | Total:  9h 34m | Avg: 41m 04s | Max: 53m 45s
  🟩 GCC                Pass: 100%/18  | Total:  9h 34m | Avg: 31m 53s | Max: 55m 51s
  🟩 MSVC               Pass: 100%/4   | Total:  4h 05m | Avg:  1h 01m | Max:  1h 04m | Hits: 433%/3540  
  🟩 NVHPC              Pass: 100%/2   | Total:  1h 29m | Avg: 44m 57s | Max: 45m 40s
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 23m 47s | Avg: 11m 53s | Max: 19m 23s
  🟩 v100               Pass: 100%/36  | Total:  1d 00h | Avg: 40m 35s | Max:  1h 04m | Hits: 433%/3540  
🟩 jobs
  🟩 Build              Pass: 100%/31  | Total: 21h 37m | Avg: 41m 50s | Max:  1h 04m | Hits: 433%/3540  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 28s | Avg: 24m 28s | Max: 24m 28s
  🟩 GraphCapture       Pass: 100%/1   | Total: 20m 16s | Avg: 20m 16s | Max: 20m 16s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 05m | Avg: 21m 58s | Max: 25m 32s
  🟩 TestGPU            Pass: 100%/2   | Total:  1h 17m | Avg: 38m 32s | Max: 48m 35s
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 23m 47s | Avg: 11m 53s | Max: 19m 23s
  🟩 90a                Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s
🟩 std
  🟩 17                 Pass: 100%/14  | Total: 10h 40m | Avg: 45m 43s | Max:  1h 04m | Hits: 435%/2655  
  🟩 20                 Pass: 100%/24  | Total: 14h 04m | Avg: 35m 11s | Max: 59m 50s | Hits: 427%/885

🟩 thrust: Pass: 100%/37 | Total: 10h 45m | Avg: 17m 26s | Max: 55m 46s | Hits: 308%/9180

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 25m 27s | Avg: 12m 43s | Max: 19m 44s
🟩 cpu
  🟩 amd64              Pass: 100%/35  | Total: 10h 30m | Avg: 18m 01s | Max: 55m 46s | Hits: 308%/9180  
  🟩 arm64              Pass: 100%/2   | Total: 14m 15s | Avg:  7m 07s | Max:  9m 40s
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total:  1h 50m | Avg: 22m 10s | Max: 44m 18s | Hits: 298%/1836  
  🟩 12.5               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 27s | Max: 37m 55s
  🟩 12.6               Pass: 100%/30  | Total:  7h 43m | Avg: 15m 26s | Max: 55m 46s | Hits: 310%/7344  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 14m 46s | Avg:  7m 23s | Max:  9m 32s
  🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 50m | Avg: 22m 10s | Max: 44m 18s | Hits: 298%/1836  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 10m | Avg: 35m 27s | Max: 37m 55s
  🟩 nvcc12.6           Pass: 100%/28  | Total:  7h 28m | Avg: 16m 01s | Max: 55m 46s | Hits: 310%/7344  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 14m 46s | Avg:  7m 23s | Max:  9m 32s
  🟩 nvcc               Pass: 100%/35  | Total: 10h 30m | Avg: 18m 00s | Max: 55m 46s | Hits: 308%/9180  
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total: 41m 37s | Avg: 10m 24s | Max: 12m 17s
  🟩 Clang15            Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
  🟩 Clang16            Pass: 100%/1   | Total:  8m 58s | Avg:  8m 58s | Max:  8m 58s
  🟩 Clang17            Pass: 100%/1   | Total: 11m 14s | Avg: 11m 14s | Max: 11m 14s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 03m | Avg:  9m 00s | Max: 21m 20s
  🟩 GCC7               Pass: 100%/2   | Total: 21m 12s | Avg: 10m 36s | Max: 11m 20s
  🟩 GCC8               Pass: 100%/1   | Total: 11m 48s | Avg: 11m 48s | Max: 11m 48s
  🟩 GCC9               Pass: 100%/2   | Total: 44m 46s | Avg: 22m 23s | Max: 35m 14s
  🟩 GCC10              Pass: 100%/1   | Total: 13m 15s | Avg: 13m 15s | Max: 13m 15s
  🟩 GCC11              Pass: 100%/1   | Total:  9m 48s | Avg:  9m 48s | Max:  9m 48s
  🟩 GCC12              Pass: 100%/1   | Total: 12m 45s | Avg: 12m 45s | Max: 12m 45s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 26m | Avg: 10m 52s | Max: 19m 44s
  🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 35m | Avg: 47m 30s | Max: 50m 42s | Hits: 295%/3672  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 22m | Avg: 47m 39s | Max: 55m 46s | Hits: 316%/5508  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 10m | Avg: 35m 27s | Max: 37m 55s
🟩 cxx_family
  🟩 Clang              Pass: 100%/14  | Total:  2h 15m | Avg:  9m 41s | Max: 21m 20s
  🟩 GCC                Pass: 100%/16  | Total:  3h 20m | Avg: 12m 32s | Max: 35m 14s
  🟩 MSVC               Pass: 100%/5   | Total:  3h 57m | Avg: 47m 35s | Max: 55m 46s | Hits: 308%/9180  
  🟩 NVHPC              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 27s | Max: 37m 55s
🟩 gpu
  🟩 v100               Pass: 100%/37  | Total: 10h 45m | Avg: 17m 26s | Max: 55m 46s | Hits: 308%/9180  
🟩 jobs
  🟩 Build              Pass: 100%/31  | Total:  8h 56m | Avg: 17m 19s | Max: 55m 46s | Hits: 294%/7344  
  🟩 TestCPU            Pass: 100%/3   | Total: 51m 53s | Avg: 17m 17s | Max: 36m 00s | Hits: 365%/1836  
  🟩 TestGPU            Pass: 100%/3   | Total: 56m 17s | Avg: 18m 45s | Max: 21m 20s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
🟩 std
  🟩 17                 Pass: 100%/14  | Total:  4h 58m | Avg: 21m 21s | Max: 51m 11s | Hits: 294%/5508  
  🟩 20                 Pass: 100%/21  | Total:  5h 20m | Avg: 15m 16s | Max: 55m 46s | Hits: 328%/3672

🟩 cudax: Pass: 100%/20 | Total: 1h 53m | Avg: 5m 41s | Max: 19m 09s | Hits: 388%/522

🟩 cpu
  🟩 amd64              Pass: 100%/16  | Total:  1h 43m | Avg:  6m 27s | Max: 19m 09s | Hits: 388%/522   
  🟩 arm64              Pass: 100%/4   | Total: 10m 35s | Avg:  2m 38s | Max:  2m 47s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total: 11m 08s | Avg: 11m 08s | Max: 11m 08s | Hits: 388%/261   
  🟩 12.5               Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  5m 27s
  🟩 12.6               Pass: 100%/17  | Total:  1h 32m | Avg:  5m 24s | Max: 19m 09s | Hits: 388%/261   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 08s | Avg: 11m 08s | Max: 11m 08s | Hits: 388%/261   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  5m 27s
  🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 32m | Avg:  5m 24s | Max: 19m 09s | Hits: 388%/261   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/20  | Total:  1h 53m | Avg:  5m 41s | Max: 19m 09s | Hits: 388%/522   
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  3m 03s | Avg:  3m 03s | Max:  3m 03s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
  🟩 Clang18            Pass: 100%/4   | Total: 27m 12s | Avg:  6m 48s | Max: 18m 42s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 02s | Avg:  3m 02s | Max:  3m 02s
  🟩 GCC11              Pass: 100%/1   | Total:  2m 55s | Avg:  2m 55s | Max:  2m 55s
  🟩 GCC12              Pass: 100%/2   | Total: 22m 21s | Avg: 11m 10s | Max: 19m 09s
  🟩 GCC13              Pass: 100%/4   | Total: 10m 32s | Avg:  2m 38s | Max:  2m 46s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 08s | Avg: 11m 08s | Max: 11m 08s | Hits: 388%/261   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 50s | Avg: 12m 50s | Max: 12m 50s | Hits: 388%/261   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  5m 27s
🟩 cxx_family
  🟩 Clang              Pass: 100%/8   | Total: 40m 20s | Avg:  5m 02s | Max: 18m 42s
  🟩 GCC                Pass: 100%/8   | Total: 38m 50s | Avg:  4m 51s | Max: 19m 09s
  🟩 MSVC               Pass: 100%/2   | Total: 23m 58s | Avg: 11m 59s | Max: 12m 50s | Hits: 388%/522   
  🟩 NVHPC              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  5m 27s
🟩 gpu
  🟩 v100               Pass: 100%/20  | Total:  1h 53m | Avg:  5m 41s | Max: 19m 09s | Hits: 388%/522   
🟩 jobs
  🟩 Build              Pass: 100%/18  | Total:  1h 16m | Avg:  4m 13s | Max: 12m 50s | Hits: 388%/522   
  🟩 Test               Pass: 100%/2   | Total: 37m 51s | Avg: 18m 55s | Max: 19m 09s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
  🟩 90a                Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 13m 25s | Avg:  3m 21s | Max:  5m 27s
  🟩 20                 Pass: 100%/16  | Total:  1h 40m | Avg:  6m 17s | Max: 19m 09s | Hits: 388%/522

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 39s | Avg: 4m 49s | Max: 7m 36s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  7m 36s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s
  🟩 Test               Pass: 100%/1   | Total:  7m 36s | Avg:  7m 36s | Max:  7m 36s

🟩 python: Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 144)

#	Runner
98	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
16	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

Co-authored-by: Michael Schellenberger Costa <[email protected]>

github-actions · 2025-01-21T15:05:11Z

🟩 CI finished in 1h 30m: Pass: 100%/144 | Total: 1d 03h | Avg: 11m 16s | Max: 1h 22m | Hits: 544%/25812

🟩 libcudacxx: Pass: 100%/46 | Total: 8h 43m | Avg: 11m 22s | Max: 36m 38s | Hits: 682%/12570

🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total:  8h 36m | Avg: 11m 43s | Max: 36m 38s | Hits: 682%/12570 
  🟩 arm64              Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  3m 37s
🟩 ctk
  🟩 12.0               Pass: 100%/8   | Total:  1h 09m | Avg:  8m 43s | Max: 20m 57s | Hits: 682%/4906  
  🟩 12.5               Pass: 100%/2   | Total: 21m 50s | Avg: 10m 55s | Max: 13m 43s
  🟩 12.6               Pass: 100%/36  | Total:  7h 11m | Avg: 11m 59s | Max: 36m 38s | Hits: 682%/7664  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 07m | Avg: 16m 48s | Max: 21m 37s
  🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 09m | Avg:  8m 43s | Max: 20m 57s | Hits: 682%/4906  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 21m 50s | Avg: 10m 55s | Max: 13m 43s
  🟩 nvcc12.6           Pass: 100%/32  | Total:  6h 04m | Avg: 11m 23s | Max: 36m 38s | Hits: 682%/7664  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 48s | Max: 21m 37s
  🟩 nvcc               Pass: 100%/42  | Total:  7h 35m | Avg: 10m 51s | Max: 36m 38s | Hits: 682%/12570 
🟩 cxx
  🟩 Clang14            Pass: 100%/6   | Total: 47m 51s | Avg:  7m 58s | Max: 17m 09s
  🟩 Clang15            Pass: 100%/1   | Total:  7m 23s | Avg:  7m 23s | Max:  7m 23s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 23s | Avg:  4m 23s | Max:  4m 23s
  🟩 Clang17            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
  🟩 Clang18            Pass: 100%/8   | Total:  1h 53m | Avg: 14m 14s | Max: 30m 42s
  🟩 GCC7               Pass: 100%/5   | Total: 16m 46s | Avg:  3m 21s | Max:  3m 40s
  🟩 GCC8               Pass: 100%/1   | Total:  3m 49s | Avg:  3m 49s | Max:  3m 49s
  🟩 GCC9               Pass: 100%/3   | Total:  9m 51s | Avg:  3m 17s | Max:  3m 43s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 36s | Avg:  3m 36s | Max:  3m 36s
  🟩 GCC11              Pass: 100%/1   | Total:  4m 03s | Avg:  4m 03s | Max:  4m 03s
  🟩 GCC12              Pass: 100%/1   | Total:  4m 10s | Avg:  4m 10s | Max:  4m 10s
  🟩 GCC13              Pass: 100%/10  | Total:  2h 40m | Avg: 16m 05s | Max: 36m 38s
  🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 07m | Avg: 22m 21s | Max: 25m 48s | Hits: 682%/7410  
  🟩 MSVC14.39          Pass: 100%/2   | Total: 53m 39s | Avg: 26m 49s | Max: 26m 54s | Hits: 682%/5160  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 21m 50s | Avg: 10m 55s | Max: 13m 43s
🟩 cxx_family
  🟩 Clang              Pass: 100%/17  | Total:  2h 57m | Avg: 10m 26s | Max: 30m 42s
  🟩 GCC                Pass: 100%/22  | Total:  3h 23m | Avg:  9m 14s | Max: 36m 38s
  🟩 MSVC               Pass: 100%/5   | Total:  2h 00m | Avg: 24m 08s | Max: 26m 54s | Hits: 682%/12570 
  🟩 NVHPC              Pass: 100%/2   | Total: 21m 50s | Avg: 10m 55s | Max: 13m 43s
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total:  8h 43m | Avg: 11m 22s | Max: 36m 38s | Hits: 682%/12570 
🟩 jobs
  🟩 Build              Pass: 100%/39  | Total:  5h 46m | Avg:  8m 53s | Max: 26m 54s | Hits: 682%/12570 
  🟩 NVRTC              Pass: 100%/4   | Total:  2h 06m | Avg: 31m 37s | Max: 36m 38s
  🟩 Test               Pass: 100%/2   | Total: 48m 04s | Avg: 24m 02s | Max: 30m 42s
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 54s | Avg:  1m 54s | Max:  1m 54s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 13m 37s | Avg: 13m 37s | Max: 13m 37s
  🟩 90a                Pass: 100%/2   | Total: 18m 08s | Avg:  9m 04s | Max: 14m 16s
🟩 std
  🟩 11                 Pass: 100%/6   | Total: 56m 51s | Avg:  9m 28s | Max: 32m 09s
  🟩 14                 Pass: 100%/4   | Total:  1h 17m | Avg: 19m 15s | Max: 35m 54s | Hits: 682%/2412  
  🟩 17                 Pass: 100%/14  | Total:  2h 31m | Avg: 10m 47s | Max: 26m 54s | Hits: 682%/7502  
  🟩 20                 Pass: 100%/21  | Total:  3h 56m | Avg: 11m 15s | Max: 36m 38s | Hits: 681%/2656

🟩 cub: Pass: 100%/38 | Total: 8h 34m | Avg: 13m 33s | Max: 1h 22m | Hits: 539%/3540

🟩 cpu
  🟩 amd64              Pass: 100%/36  | Total:  8h 25m | Avg: 14m 01s | Max:  1h 22m | Hits: 539%/3540  
  🟩 arm64              Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 07s
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total: 46m 01s | Avg:  9m 12s | Max: 25m 30s | Hits: 539%/885   
  🟩 12.5               Pass: 100%/2   | Total: 19m 31s | Avg:  9m 45s | Max:  9m 57s
  🟩 12.6               Pass: 100%/31  | Total:  7h 29m | Avg: 14m 29s | Max:  1h 22m | Hits: 539%/2655  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 16s | Avg:  4m 38s | Max:  4m 43s
  🟩 nvcc12.0           Pass: 100%/5   | Total: 46m 01s | Avg:  9m 12s | Max: 25m 30s | Hits: 539%/885   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 31s | Avg:  9m 45s | Max:  9m 57s
  🟩 nvcc12.6           Pass: 100%/29  | Total:  7h 20m | Avg: 15m 10s | Max:  1h 22m | Hits: 539%/2655  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 16s | Avg:  4m 38s | Max:  4m 43s
  🟩 nvcc               Pass: 100%/36  | Total:  8h 25m | Avg: 14m 02s | Max:  1h 22m | Hits: 539%/3540  
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total: 20m 51s | Avg:  5m 12s | Max:  5m 55s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 38m | Avg: 14m 00s | Max: 48m 23s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 36s | Avg:  5m 18s | Max:  5m 23s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
  🟩 GCC9               Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  5m 46s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 59s | Avg:  5m 59s | Max:  5m 59s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
  🟩 GCC12              Pass: 100%/3   | Total: 29m 26s | Avg:  9m 48s | Max: 19m 18s
  🟩 GCC13              Pass: 100%/8   | Total:  3h 01m | Avg: 22m 44s | Max:  1h 22m
  🟩 MSVC14.29          Pass: 100%/2   | Total: 52m 15s | Avg: 26m 07s | Max: 26m 45s | Hits: 539%/1770  
  🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 57s | Avg: 28m 28s | Max: 29m 40s | Hits: 539%/1770  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 31s | Avg:  9m 45s | Max:  9m 57s
🟩 cxx_family
  🟩 Clang              Pass: 100%/14  | Total:  2h 15m | Avg:  9m 42s | Max: 48m 23s
  🟩 GCC                Pass: 100%/18  | Total:  4h 10m | Avg: 13m 54s | Max:  1h 22m
  🟩 MSVC               Pass: 100%/4   | Total:  1h 49m | Avg: 27m 18s | Max: 29m 40s | Hits: 539%/3540  
  🟩 NVHPC              Pass: 100%/2   | Total: 19m 31s | Avg:  9m 45s | Max:  9m 57s
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 23m 42s | Avg: 11m 51s | Max: 19m 18s
  🟩 v100               Pass: 100%/36  | Total:  8h 11m | Avg: 13m 38s | Max:  1h 22m | Hits: 539%/3540  
🟩 jobs
  🟩 Build              Pass: 100%/31  | Total:  4h 21m | Avg:  8m 26s | Max: 29m 40s | Hits: 539%/3540  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 05s | Avg: 21m 05s | Max: 21m 05s
  🟩 GraphCapture       Pass: 100%/1   | Total:  1h 22m | Avg:  1h 22m | Max:  1h 22m
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 31m | Avg: 30m 35s | Max: 48m 23s
  🟩 TestGPU            Pass: 100%/2   | Total: 57m 50s | Avg: 28m 55s | Max: 33m 57s
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 23m 42s | Avg: 11m 51s | Max: 19m 18s
  🟩 90a                Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
🟩 std
  🟩 17                 Pass: 100%/14  | Total:  2h 23m | Avg: 10m 16s | Max: 27m 17s | Hits: 539%/2655  
  🟩 20                 Pass: 100%/24  | Total:  6h 11m | Avg: 15m 28s | Max:  1h 22m | Hits: 539%/885

🟩 thrust: Pass: 100%/37 | Total: 7h 05m | Avg: 11m 29s | Max: 33m 39s | Hits: 365%/9180

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 36m 45s | Avg: 18m 22s | Max: 30m 19s
🟩 cpu
  🟩 amd64              Pass: 100%/35  | Total:  6h 55m | Avg: 11m 52s | Max: 33m 39s | Hits: 365%/9180  
  🟩 arm64              Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  5m 00s
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total: 56m 09s | Avg: 11m 13s | Max: 28m 45s | Hits: 365%/1836  
  🟩 12.5               Pass: 100%/2   | Total: 27m 32s | Avg: 13m 46s | Max: 13m 59s
  🟩 12.6               Pass: 100%/30  | Total:  5h 41m | Avg: 11m 22s | Max: 33m 39s | Hits: 365%/7344  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 16s | Avg:  5m 08s | Max:  5m 12s
  🟩 nvcc12.0           Pass: 100%/5   | Total: 56m 09s | Avg: 11m 13s | Max: 28m 45s | Hits: 365%/1836  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 27m 32s | Avg: 13m 46s | Max: 13m 59s
  🟩 nvcc12.6           Pass: 100%/28  | Total:  5h 31m | Avg: 11m 49s | Max: 33m 39s | Hits: 365%/7344  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 16s | Avg:  5m 08s | Max:  5m 12s
  🟩 nvcc               Pass: 100%/35  | Total:  6h 54m | Avg: 11m 51s | Max: 33m 39s | Hits: 365%/9180  
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total: 21m 48s | Avg:  5m 27s | Max:  6m 00s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 07m | Avg:  9m 36s | Max: 33m 03s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 20s | Avg:  5m 10s | Max:  5m 12s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
  🟩 GCC9               Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max: 11m 56s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
  🟩 GCC12              Pass: 100%/1   | Total:  6m 18s | Avg:  6m 18s | Max:  6m 18s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 24m | Avg: 10m 36s | Max: 30m 19s
  🟩 MSVC14.29          Pass: 100%/2   | Total: 58m 00s | Avg: 29m 00s | Max: 29m 15s | Hits: 365%/3672  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 38m | Avg: 32m 47s | Max: 33m 39s | Hits: 365%/5508  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 27m 32s | Avg: 13m 46s | Max: 13m 59s
🟩 cxx_family
  🟩 Clang              Pass: 100%/14  | Total:  1h 45m | Avg:  7m 31s | Max: 33m 03s
  🟩 GCC                Pass: 100%/16  | Total:  2h 15m | Avg:  8m 28s | Max: 30m 19s
  🟩 MSVC               Pass: 100%/5   | Total:  2h 36m | Avg: 31m 16s | Max: 33m 39s | Hits: 365%/9180  
  🟩 NVHPC              Pass: 100%/2   | Total: 27m 32s | Avg: 13m 46s | Max: 13m 59s
🟩 gpu
  🟩 v100               Pass: 100%/37  | Total:  7h 05m | Avg: 11m 29s | Max: 33m 39s | Hits: 365%/9180  
🟩 jobs
  🟩 Build              Pass: 100%/31  | Total:  4h 52m | Avg:  9m 26s | Max: 32m 51s | Hits: 365%/7344  
  🟩 TestCPU            Pass: 100%/3   | Total: 49m 17s | Avg: 16m 25s | Max: 33m 39s | Hits: 365%/1836  
  🟩 TestGPU            Pass: 100%/3   | Total:  1h 23m | Avg: 27m 41s | Max: 33m 03s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s
🟩 std
  🟩 17                 Pass: 100%/14  | Total:  2h 44m | Avg: 11m 45s | Max: 31m 52s | Hits: 365%/5508  
  🟩 20                 Pass: 100%/21  | Total:  3h 43m | Avg: 10m 39s | Max: 33m 39s | Hits: 365%/3672

🟩 cudax: Pass: 100%/20 | Total: 1h 48m | Avg: 5m 25s | Max: 20m 18s | Hits: 388%/522

🟩 cpu
  🟩 amd64              Pass: 100%/16  | Total:  1h 38m | Avg:  6m 08s | Max: 20m 18s | Hits: 388%/522   
  🟩 arm64              Pass: 100%/4   | Total: 10m 14s | Avg:  2m 33s | Max:  2m 34s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total: 10m 53s | Avg: 10m 53s | Max: 10m 53s | Hits: 388%/261   
  🟩 12.5               Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 26s
  🟩 12.6               Pass: 100%/17  | Total:  1h 26m | Avg:  5m 06s | Max: 20m 18s | Hits: 388%/261   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 53s | Avg: 10m 53s | Max: 10m 53s | Hits: 388%/261   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 26s
  🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 26m | Avg:  5m 06s | Max: 20m 18s | Hits: 388%/261   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/20  | Total:  1h 48m | Avg:  5m 25s | Max: 20m 18s | Hits: 388%/522   
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 09s | Avg:  3m 09s | Max:  3m 09s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
  🟩 Clang18            Pass: 100%/4   | Total: 28m 30s | Avg:  7m 07s | Max: 20m 18s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 01s | Avg:  3m 01s | Max:  3m 01s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
  🟩 GCC12              Pass: 100%/2   | Total: 17m 47s | Avg:  8m 53s | Max: 14m 38s
  🟩 GCC13              Pass: 100%/4   | Total: 10m 18s | Avg:  2m 34s | Max:  2m 36s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 53s | Avg: 10m 53s | Max: 10m 53s | Hits: 388%/261   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s | Hits: 388%/261   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 26s
🟩 cxx_family
  🟩 Clang              Pass: 100%/8   | Total: 41m 20s | Avg:  5m 10s | Max: 20m 18s
  🟩 GCC                Pass: 100%/8   | Total: 34m 19s | Avg:  4m 17s | Max: 14m 38s
  🟩 MSVC               Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 11m 03s | Hits: 388%/522   
  🟩 NVHPC              Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 26s
🟩 gpu
  🟩 v100               Pass: 100%/20  | Total:  1h 48m | Avg:  5m 25s | Max: 20m 18s | Hits: 388%/522   
🟩 jobs
  🟩 Build              Pass: 100%/18  | Total:  1h 13m | Avg:  4m 04s | Max: 11m 03s | Hits: 388%/522   
  🟩 Test               Pass: 100%/2   | Total: 34m 56s | Avg: 17m 28s | Max: 20m 18s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 35s | Avg:  2m 35s | Max:  2m 35s
  🟩 90a                Pass: 100%/1   | Total:  2m 36s | Avg:  2m 36s | Max:  2m 36s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 13m 09s | Avg:  3m 17s | Max:  5m 26s
  🟩 20                 Pass: 100%/16  | Total:  1h 35m | Avg:  5m 57s | Max: 20m 18s | Hits: 388%/522

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 38s | Avg: 4m 49s | Max: 7m 30s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  7m 30s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s
  🟩 Test               Pass: 100%/1   | Total:  7m 30s | Avg:  7m 30s | Max:  7m 30s

🟩 python: Pass: 100%/1 | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 144)

#	Runner
98	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
16	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

Co-authored-by: Michael Schellenberger Costa <[email protected]>

@shwina

update docs update docs add `memcmp`, `memmove` and `memchr` implementations implement tests Use cuda::std::min/max in Thrust (NVIDIA#3364) Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (NVIDIA#3361) * implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` Cleanup util_arch (NVIDIA#2773) Deprecate thrust::null_type (NVIDIA#3367) Deprecate cub::DeviceSpmv (NVIDIA#3320) Fixes: NVIDIA#896 Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Compile basic infra test with C++17 (NVIDIA#3377) Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements Exit with error when RAPIDS CI fails. (NVIDIA#3385) cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <[email protected]> Deprecate thrust::async (NVIDIA#3324) Fixes: NVIDIA#100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342) Fix broken `_CCCL_BUILTIN_ASSUME` macro (NVIDIA#3314) * add compiler-specific path * fix device code path * add _CCC_ASSUME Deprecate thrust::numeric_limits (NVIDIA#3366) Replace `typedef` with `using` in libcu++ (NVIDIA#3368) Deprecate thrust::optional (NVIDIA#3307) Fixes: NVIDIA#3306 Upgrade to Catch2 3.8 (NVIDIA#3310) Fixes: NVIDIA#1724 refactor `<cuda/std/cstdint>` (NVIDIA#3325) Co-authored-by: Bernhard Manfred Gruber <[email protected]> Update CODEOWNERS (NVIDIA#3331) * Update CODEOWNERS * Update CODEOWNERS * Update CODEOWNERS * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix sign-compare warning (NVIDIA#3408) Implement more cmath functions to be usable on host and device (NVIDIA#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (NVIDIA#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <[email protected]> Fix assert definition for NVHPC due to constexpr issues (NVIDIA#3418) NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it. Fix this by always using the host definition which should also work on device. Fixes NVIDIA#3411 Extend CUB reduce benchmarks (NVIDIA#3401) * Rename max.cu to custom.cu, since it uses a custom operator * Extend types covered my min.cu to all fundamental types * Add some notes on how to collect tuning parameters Fixes: NVIDIA#3283 Update upload-pages-artifact to v3 (NVIDIA#3423) * Update upload-pages-artifact to v3 * Empty commit --------- Co-authored-by: Ashwin Srinath <[email protected]> Replace and deprecate thrust::cuda_cub::terminate (NVIDIA#3421) `std::linalg` accessors and `transposed_layout` (NVIDIA#2962) Add round up/down to multiple (NVIDIA#3234) [FEA]: Introduce Python module with CCCL headers (NVIDIA#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under NVIDIA#3201 (comment)) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * NVIDIA#3201 (comment) * NVIDIA#3201 (comment) * Install CCCL headers under cuda.cccl.include Trigger for this change: * NVIDIA#3201 (comment) Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d6. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (NVIDIA#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a21. Error message: NVIDIA#3201 (comment) * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd. * Implement suggestion by @shwina (NVIDIA#3201 (review)) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (NVIDIA#3434) Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes NVIDIA#3404 Fix CI issues (NVIDIA#3443) Remove deprecated `cub::min` (NVIDIA#3450) * Remove deprecated `cuda::{min,max}` * Drop unused `thrust::remove_cvref` file Fix typo in builtin (NVIDIA#3451) Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435) uses unsigned offset types in thrust's scan dispatch (NVIDIA#3436) Default transform_iterator's copy ctor (NVIDIA#3395) Fixes: NVIDIA#2393 Turn C++ dialect warning into error (NVIDIA#3453) Uses unsigned offset types in thrust's sort algorithm calling into `DispatchMergeSort` (NVIDIA#3437) * uses thrust's dynamic dispatch for merge_sort * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Refactor allocator handling of contiguous_storage (NVIDIA#3050) Co-authored-by: Michael Schellenberger Costa <[email protected]> Drop thrust::detail::integer_traits (NVIDIA#3391) Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379) Co-authored-by: Michael Schellenberger Costa <[email protected]> Improve docs of std headers (NVIDIA#3416) Drop C++11 and C++14 support for all of cccl (NVIDIA#3417) * Drop C++11 and C++14 support for all of cccl --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> Deprecate a few CUB macros (NVIDIA#3456) Deprecate thrust universal iterator categories (NVIDIA#3461) Fix launch args order (NVIDIA#3465) Add `--extended-lambda` to the list of removed clangd flags (NVIDIA#3432) add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429) Add `_CCCL_BUILTIN_PREFETCH` (NVIDIA#3433) Drop universal iterator categories (NVIDIA#3474) Ensure that headers in `<cuda/*>` can be build with a C++ only compiler (NVIDIA#3472) Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470) Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements Co-authored-by: Michael Schellenberger Costa <[email protected]> Moves CUB kernel entry points to a detail namespace (NVIDIA#3468) * moves emptykernel to detail ns * second batch * third batch * fourth batch * fixes cuda parallel * concatenates nested namespaces Deprecate block/warp algo specializations (NVIDIA#3455) Fixes: NVIDIA#3409 Refactor CUB's util_debug (NVIDIA#3345)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* add `_CCCL_HAS_NVFP8` macro (#3429) * Add cuda::is_floating_point supporting half and bfloat (#3379) Co-authored-by: Michael Schellenberger Costa <[email protected]> * Specialize __is_extended_floating_point for FP8 types (#3470) Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements Co-authored-by: Michael Schellenberger Costa <[email protected]> --------- Co-authored-by: Federico Busato <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Cleanup util_arch (NVIDIA#2773) Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <[email protected]> Deprecate thrust::async (NVIDIA#3324) Fixes: NVIDIA#100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342) Deprecate thrust::numeric_limits (NVIDIA#3366) Upgrade to Catch2 3.8 (NVIDIA#3310) Fixes: NVIDIA#1724 Fix sign-compare warning (NVIDIA#3408) Implement more cmath functions to be usable on host and device (NVIDIA#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (NVIDIA#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <[email protected]> cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes NVIDIA#3404 Remove deprecated `cub::min` (NVIDIA#3450) * Remove deprecated `cuda::{min,max}` * Drop unused `thrust::remove_cvref` file Fix typo in builtin (NVIDIA#3451) Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435) Drop thrust::detail::integer_traits (NVIDIA#3391) Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379) Co-authored-by: Michael Schellenberger Costa <[email protected]> add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429) Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470) Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements Co-authored-by: Michael Schellenberger Costa <[email protected]> Moves CUB kernel entry points to a detail namespace (NVIDIA#3468) * moves emptykernel to detail ns * second batch * third batch * fourth batch * fixes cuda parallel * concatenates nested namespaces Deprecate block/warp algo specializations (NVIDIA#3455) Fixes: NVIDIA#3409 fix documentation

Co-authored-by: Michael Schellenberger Costa <[email protected]>

bernhardmgruber requested review from a team as code owners January 14, 2025 09:47

bernhardmgruber requested review from wmaxey and alliepiper January 14, 2025 09:47

bernhardmgruber mentioned this pull request Jan 14, 2025

Specialize relevant cuda::(std::) types for __half/bfloat16/fp8 #525

Open

9 tasks

miscco requested changes Jan 14, 2025

View reviewed changes

bernhardmgruber changed the title ~~Specialize is_floating_point for half and bfloat~~ Add cuda::is_floating_point supporting half and bfloat Jan 21, 2025

bernhardmgruber force-pushed the half_limits branch 2 times, most recently from 3f0866f to a3d5e27 Compare January 21, 2025 10:54

bernhardmgruber commented Jan 21, 2025

View reviewed changes

libcudacxx/include/cuda/__type_traits/is_floating_point.h Outdated Show resolved Hide resolved

miscco reviewed Jan 21, 2025

View reviewed changes

libcudacxx/include/cuda/__type_traits/is_floating_point.h Outdated Show resolved Hide resolved

libcudacxx/include/cuda/__type_traits/is_floating_point.h Outdated Show resolved Hide resolved

bernhardmgruber force-pushed the half_limits branch from a3d5e27 to c4c91f4 Compare January 21, 2025 11:23

miscco reviewed Jan 21, 2025

View reviewed changes

libcudacxx/include/cuda/__type_traits/is_floating_point.h Outdated Show resolved Hide resolved

libcudacxx/include/cuda/__type_traits/is_floating_point.h Outdated Show resolved Hide resolved

Add cuda::is_floating_point supporting half and bfloat

7e396f5

Co-authored-by: Michael Schellenberger Costa <[email protected]>

bernhardmgruber force-pushed the half_limits branch from f73b1fc to 7e396f5 Compare January 21, 2025 13:32

miscco approved these changes Jan 21, 2025

View reviewed changes

bernhardmgruber merged commit 4812f28 into NVIDIA:main Jan 21, 2025
156 of 159 checks passed

bernhardmgruber deleted the half_limits branch January 21, 2025 15:39

bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this pull request Jan 22, 2025

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

432a060

Co-authored-by: Michael Schellenberger Costa <[email protected]>

bernhardmgruber mentioned this pull request Jan 22, 2025

Backport to 2.8: some FP8 support #3479

Merged

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 22, 2025

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

f05b524

Co-authored-by: Michael Schellenberger Costa <[email protected]>

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 22, 2025

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

c89cf67

Co-authored-by: Michael Schellenberger Costa <[email protected]>

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 23, 2025

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

e494e4f

Co-authored-by: Michael Schellenberger Costa <[email protected]>

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 29, 2025

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

deccc4f

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cuda::is_floating_point` supporting half and bfloat #3379

Add `cuda::is_floating_point` supporting half and bfloat #3379

bernhardmgruber commented Jan 14, 2025 •

edited

Loading

davebayer commented Jan 14, 2025

miscco left a comment

wmaxey commented Jan 15, 2025

miscco commented Jan 16, 2025

jrhemstad commented Jan 16, 2025

github-actions bot commented Jan 21, 2025

🟨 libcudacxx: Pass: 97%/46 | Total: 8h 38m | Avg: 11m 16s | Max: 35m 12s | Hits: 682%/12570

🟩 cub: Pass: 100%/38 | Total: 1d 00h | Avg: 39m 04s | Max: 1h 04m | Hits: 433%/3540

🟩 thrust: Pass: 100%/37 | Total: 10h 45m | Avg: 17m 26s | Max: 55m 46s | Hits: 308%/9180

🟩 cudax: Pass: 100%/20 | Total: 1h 53m | Avg: 5m 41s | Max: 19m 09s | Hits: 388%/522

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 39s | Avg: 4m 49s | Max: 7m 36s

🟩 python: Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 144)

github-actions bot commented Jan 21, 2025

🟩 libcudacxx: Pass: 100%/46 | Total: 8h 43m | Avg: 11m 22s | Max: 36m 38s | Hits: 682%/12570

🟩 cub: Pass: 100%/38 | Total: 8h 34m | Avg: 13m 33s | Max: 1h 22m | Hits: 539%/3540

🟩 thrust: Pass: 100%/37 | Total: 7h 05m | Avg: 11m 29s | Max: 33m 39s | Hits: 365%/9180

🟩 cudax: Pass: 100%/20 | Total: 1h 48m | Avg: 5m 25s | Max: 20m 18s | Hits: 388%/522

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 38s | Avg: 4m 49s | Max: 7m 30s

🟩 python: Pass: 100%/1 | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 144)

Add cuda::is_floating_point supporting half and bfloat #3379

Add cuda::is_floating_point supporting half and bfloat #3379

Conversation

bernhardmgruber commented Jan 14, 2025 • edited Loading

davebayer commented Jan 14, 2025

miscco left a comment

Choose a reason for hiding this comment

wmaxey commented Jan 15, 2025

miscco commented Jan 16, 2025

jrhemstad commented Jan 16, 2025

github-actions bot commented Jan 21, 2025

🟨 libcudacxx: Pass: 97%/46 | Total: 8h 38m | Avg: 11m 16s | Max: 35m 12s | Hits: 682%/12570

🟩 cub: Pass: 100%/38 | Total: 1d 00h | Avg: 39m 04s | Max: 1h 04m | Hits: 433%/3540

🟩 thrust: Pass: 100%/37 | Total: 10h 45m | Avg: 17m 26s | Max: 55m 46s | Hits: 308%/9180

🟩 cudax: Pass: 100%/20 | Total: 1h 53m | Avg: 5m 41s | Max: 19m 09s | Hits: 388%/522

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 39s | Avg: 4m 49s | Max: 7m 36s

🟩 python: Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 144)

github-actions bot commented Jan 21, 2025

🟩 libcudacxx: Pass: 100%/46 | Total: 8h 43m | Avg: 11m 22s | Max: 36m 38s | Hits: 682%/12570

🟩 cub: Pass: 100%/38 | Total: 8h 34m | Avg: 13m 33s | Max: 1h 22m | Hits: 539%/3540

🟩 thrust: Pass: 100%/37 | Total: 7h 05m | Avg: 11m 29s | Max: 33m 39s | Hits: 365%/9180

🟩 cudax: Pass: 100%/20 | Total: 1h 48m | Avg: 5m 25s | Max: 20m 18s | Hits: 388%/522

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 38s | Avg: 4m 49s | Max: 7m 30s

🟩 python: Pass: 100%/1 | Total: 42m 46s | Avg: 42m 46s | Max: 42m 46s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 144)

Add `cuda::is_floating_point` supporting half and bfloat #3379

Add `cuda::is_floating_point` supporting half and bfloat #3379

bernhardmgruber commented Jan 14, 2025 •

edited

Loading