Skip to content

Implement cuda::isqrt #4427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Implement cuda::isqrt #4427

wants to merge 4 commits into from

Conversation

davebayer
Copy link
Contributor

@davebayer davebayer commented Apr 12, 2025

This PR introduces cuda::isqrt function which computes the integer square root of a given input.

The implementation is based on reference implementation from P3605R0 proposal, I am a bit unsure whether I am able to reuse it.

@davebayer davebayer requested review from a team as code owners April 12, 2025 05:22
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 12, 2025
Copy link

copy-pr-bot bot commented Apr 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 12, 2025
Comment on lines 54 to 64
_Up __current{0};
_Up __next{_Up(_Up{1} << ((_CUDA_VSTD::bit_width(_Up(__v - 1)) + 1) / 2))};

do
{
__current = __next;
__next = _Up((__current + _Up(__v) / __current) / 2);
} while (__next < __current);

return static_cast<_Tp>(__current);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just cast to the respective floating point and then cast back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's tricky. Conversion to float is worth it only if --prec-sqrt=false is used...on the other hand, this cannot be detected. Need to check with the compiler team

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other two notes: PTX sqrt can be used to directly use the fast math (approx) mode. We need to be very careful on number >= 2^23 because of the loss of precision in the conversion to floating point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the digit-by-digit algorithm that only does bit shifts and +- operations and does nbits / 2 - 1 steps maximum, check it out!

@miscco
Copy link
Contributor

miscco commented Apr 14, 2025

/ok to test 8730622

Copy link
Contributor

🟩 CI finished in 2h 22m: Pass: 100%/170 | Total: 3d 16h | Avg: 31m 18s | Max: 1h 37m | Hits: 69%/269901
  • 🟩 cub: Pass: 100%/47 | Total: 2d 00h | Avg: 1h 02m | Max: 1h 32m | Hits: 30%/56545

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 22h | Avg:  1h 02m | Max:  1h 32m | Hits:  30%/54087 
      🟩 arm64              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 09m | Hits:  16%/2458  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 45m | Avg:  1h 09m | Max:  1h 16m | Hits:  15%/5974  
      🟩 12.8               Pass: 100%/42  | Total:  1d 19h | Avg:  1h 01m | Max:  1h 32m | Hits:  32%/50571 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 06m | Hits:  15%/2120  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 45m | Avg:  1h 09m | Max:  1h 16m | Hits:  15%/5974  
      🟩 nvcc12.8           Pass: 100%/40  | Total:  1d 16h | Avg:  1h 01m | Max:  1h 32m | Hits:  32%/48451 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 06m | Hits:  15%/2120  
      🟩 nvcc               Pass: 100%/45  | Total:  1d 22h | Avg:  1h 02m | Max:  1h 32m | Hits:  30%/54425 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 30m | Avg:  1h 07m | Max:  1h 09m | Hits:  16%/4924  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 13m | Hits:  16%/2458  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 08m | Hits:  16%/2458  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  16%/2458  
      🟩 Clang18            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m | Hits:  16%/2458  
      🟩 Clang19            Pass: 100%/7   | Total:  6h 36m | Avg: 56m 35s | Max:  1h 25m | Hits:  40%/8265  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 18m | Hits:  16%/2462  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 05m | Avg:  1h 05m | Max:  1h 05m | Hits:  16%/1231  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 10m | Hits:  16%/2462  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 11m | Hits:  16%/2462  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m | Hits:  15%/2458  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 13m | Hits:  15%/2458  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 55m | Avg: 43m 15s | Max:  1h 21m | Hits:  61%/13519 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 37m | Avg:  1h 18m | Max:  1h 21m | Hits:  12%/2100  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 53m | Avg:  1h 26m | Max:  1h 32m | Hits:  12%/2100  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 20m | Hits:  12%/2272  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 19h 59m | Avg:  1h 03m | Max:  1h 25m | Hits:  25%/23021 
      🟩 GCC                Pass: 100%/22  | Total: 20h 39m | Avg: 56m 20s | Max:  1h 21m | Hits:  38%/27052 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 31m | Avg:  1h 22m | Max:  1h 32m | Hits:  12%/4200  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 20m | Hits:  12%/2272  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 18m | Avg: 26m 15s | Max: 30m 00s | Hits:  71%/3687  
      🟩 rtx2080            Pass: 100%/36  | Total:  1d 18h | Avg:  1h 11m | Max:  1h 32m | Hits:  15%/43026 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 46m | Avg: 35m 46s | Max:  1h 10m | Hits:  78%/9832  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 21h | Avg:  1h 10m | Max:  1h 32m | Hits:  15%/46713 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 28m 42s | Avg: 28m 42s | Max: 28m 42s | Hits:  99%/1229  
      🟩 GraphCapture       Pass: 100%/1   | Total: 21m 13s | Avg: 21m 13s | Max: 21m 13s | Hits:  99%/1229  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 19m | Avg: 26m 37s | Max: 27m 13s | Hits:  99%/3687  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 08m | Avg: 22m 43s | Max: 24m 35s | Hits:  99%/3687  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 18m | Avg: 26m 15s | Max: 30m 00s | Hits:  71%/3687  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 21m | Avg:  1h 21m | Max:  1h 21m | Hits:  15%/1229  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  1d 00h | Avg:  1h 11m | Max:  1h 25m | Hits:  15%/25026 
      🟩 20                 Pass: 100%/26  | Total: 23h 53m | Avg: 55m 08s | Max:  1h 32m | Hits:  42%/31519 
    
  • 🟩 thrust: Pass: 100%/47 | Total: 1d 04h | Avg: 36m 15s | Max: 1h 31m | Hits: 56%/83463

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 40m 31s | Avg: 20m 15s | Max: 29m 07s | Hits:  73%/3554  
    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 03h | Avg: 36m 28s | Max:  1h 31m | Hits:  56%/79910 
      🟩 arm64              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 07s | Max: 32m 25s | Hits:  47%/3553  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 26m | Avg: 41m 13s | Max:  1h 04m | Hits:  54%/8876  
      🟩 12.8               Pass: 100%/42  | Total:  1d 00h | Avg: 35m 39s | Max:  1h 31m | Hits:  56%/74587 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 01m | Avg: 30m 33s | Max: 30m 50s | Hits:  48%/3552  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 26m | Avg: 41m 13s | Max:  1h 04m | Hits:  54%/8876  
      🟩 nvcc12.8           Pass: 100%/40  | Total: 23h 56m | Avg: 35m 54s | Max:  1h 31m | Hits:  57%/71035 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 33s | Max: 30m 50s | Hits:  48%/3552  
      🟩 nvcc               Pass: 100%/45  | Total:  1d 03h | Avg: 36m 30s | Max:  1h 31m | Hits:  56%/79911 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 19m | Avg: 34m 45s | Max: 35m 48s | Hits:  57%/7104  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 10m | Avg: 35m 12s | Max: 36m 00s | Hits:  48%/3552  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 12m | Avg: 36m 24s | Max: 37m 34s | Hits:  48%/3552  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 21s | Max: 34m 11s | Hits:  48%/3552  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 47s | Max: 38m 51s | Hits:  48%/3552  
      🟩 Clang19            Pass: 100%/7   | Total:  2h 53m | Avg: 24m 50s | Max: 33m 42s | Hits:  64%/12432 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 32s | Max: 33m 47s | Hits:  58%/3554  
      🟩 GCC8               Pass: 100%/1   | Total: 36m 53s | Avg: 36m 53s | Max: 36m 53s | Hits:  47%/1777  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 48s | Max: 37m 40s | Hits:  61%/3554  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 19m | Avg: 39m 53s | Max: 40m 17s | Hits:  47%/3554  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 17m | Avg: 38m 45s | Max: 40m 22s | Hits:  47%/3554  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 23s | Max: 36m 25s | Hits:  47%/3554  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 45m | Avg: 22m 33s | Max: 34m 52s | Hits:  74%/17770 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 04m | Hits:  37%/3540  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 54m | Avg: 58m 03s | Max:  1h 17m | Hits:  50%/5310  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  2h 52m | Avg:  1h 26m | Max:  1h 31m | Hits:  25%/3552  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  9h 56m | Avg: 31m 23s | Max: 38m 51s | Hits:  55%/33744 
      🟩 GCC                Pass: 100%/21  | Total: 10h 31m | Avg: 30m 03s | Max: 40m 22s | Hits:  62%/37317 
      🟩 MSVC               Pass: 100%/5   | Total:  5h 03m | Avg:  1h 00m | Max:  1h 17m | Hits:  44%/8850  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 52m | Avg:  1h 26m | Max:  1h 31m | Hits:  25%/3552  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 32m 37s | Avg: 16m 18s | Max: 20m 33s | Hits:  73%/3554  
      🟩 rtx2080            Pass: 100%/35  | Total: 23h 53m | Avg: 40m 58s | Max:  1h 31m | Hits:  48%/62156 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 57m | Avg: 23m 43s | Max:  1h 07m | Hits:  80%/17753 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 02h | Avg: 40m 20s | Max:  1h 31m | Hits:  48%/71033 
      🟩 TestCPU            Pass: 100%/3   | Total: 44m 50s | Avg: 14m 56s | Max: 28m 36s | Hits:  99%/5323  
      🟩 TestGPU            Pass: 100%/4   | Total: 45m 22s | Avg: 11m 20s | Max: 12m 04s | Hits:  99%/7107  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 32m 37s | Avg: 16m 18s | Max: 20m 33s | Hits:  73%/3554  
      🟩 90;90a;100         Pass: 100%/1   | Total: 34m 07s | Avg: 34m 07s | Max: 34m 07s | Hits:  75%/1777  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 15h 05m | Avg: 43m 06s | Max:  1h 31m | Hits:  47%/37287 
      🟩 20                 Pass: 100%/24  | Total: 12h 38m | Avg: 31m 35s | Max:  1h 21m | Hits:  62%/42622 
    
  • 🟩 libcudacxx: Pass: 100%/45 | Total: 6h 06m | Avg: 8m 08s | Max: 25m 45s | Hits: 95%/116193

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  5h 58m | Avg:  8m 20s | Max: 25m 45s | Hits:  95%/110166
      🟩 arm64              Pass: 100%/2   | Total:  7m 56s | Avg:  3m 58s | Max:  4m 02s | Hits:  98%/6027  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 37m 20s | Avg:  7m 28s | Max: 22m 07s | Hits:  99%/14685 
      🟩 12.8               Pass: 100%/40  | Total:  5h 29m | Avg:  8m 13s | Max: 25m 45s | Hits:  94%/101508
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 46m 41s | Avg: 23m 20s | Max: 24m 14s | Hits:  27%/5987  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 37m 20s | Avg:  7m 28s | Max: 22m 07s | Hits:  99%/14685 
      🟩 nvcc12.8           Pass: 100%/38  | Total:  4h 42m | Avg:  7m 25s | Max: 25m 45s | Hits:  98%/95521 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 46m 41s | Avg: 23m 20s | Max: 24m 14s | Hits:  27%/5987  
      🟩 nvcc               Pass: 100%/43  | Total:  5h 19m | Avg:  7m 26s | Max: 25m 45s | Hits:  98%/110206
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 18m 23s | Avg:  4m 35s | Max:  5m 29s | Hits:  98%/11942 
      🟩 Clang15            Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  4m 43s | Hits:  99%/5983  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 03s | Hits:  99%/5983  
      🟩 Clang17            Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  4m 56s | Hits:  99%/5983  
      🟩 Clang18            Pass: 100%/2   | Total:  9m 28s | Avg:  4m 44s | Max:  4m 54s | Hits:  98%/5983  
      🟩 Clang19            Pass: 100%/6   | Total:  1h 08m | Avg: 11m 29s | Max: 24m 14s | Hits:  70%/14983 
      🟩 GCC7               Pass: 100%/2   | Total:  7m 49s | Avg:  3m 54s | Max:  4m 16s | Hits:  99%/5919  
      🟩 GCC8               Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s | Hits:  99%/2970  
      🟩 GCC9               Pass: 100%/2   | Total:  7m 56s | Avg:  3m 58s | Max:  4m 02s | Hits:  99%/5931  
      🟩 GCC10              Pass: 100%/2   | Total:  8m 35s | Avg:  4m 17s | Max:  4m 30s | Hits:  98%/5989  
      🟩 GCC11              Pass: 100%/2   | Total:  8m 04s | Avg:  4m 02s | Max:  4m 05s | Hits:  98%/5985  
      🟩 GCC12              Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  4m 48s | Hits:  98%/5985  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 20m | Avg:  8m 03s | Max: 16m 22s | Hits:  98%/15245 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 22m 17s | Hits:  98%/5633  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 48m 36s | Avg: 24m 18s | Max: 25m 45s | Hits:  98%/5706  
      🟩 NVHPC25.3          Pass: 100%/2   | Total: 21m 37s | Avg: 10m 48s | Max: 11m 08s | Hits:  98%/5973  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  2h 05m | Avg:  6m 59s | Max: 24m 14s | Hits:  90%/50857 
      🟩 GCC                Pass: 100%/21  | Total:  2h 05m | Avg:  5m 59s | Max: 16m 22s | Hits:  98%/48024 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 33m | Avg: 23m 15s | Max: 25m 45s | Hits:  98%/11339 
      🟩 NVHPC              Pass: 100%/2   | Total: 21m 37s | Avg: 10m 48s | Max: 11m 08s | Hits:  98%/5973  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 09s | Avg:  9m 04s | Max: 14m 05s | Hits:  98%/3103  
      🟩 rtx2080            Pass: 100%/43  | Total:  5h 48m | Avg:  8m 06s | Max: 25m 45s | Hits:  95%/113090
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  4h 59m | Avg:  7m 40s | Max: 25m 45s | Hits:  95%/116153
      🟩 NVRTC              Pass: 100%/2   | Total: 32m 34s | Avg: 16m 17s | Max: 16m 22s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 32m 15s | Avg: 10m 45s | Max: 14m 05s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 32m 34s | Avg: 16m 17s | Max: 16m 22s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 18m 09s | Avg:  9m 04s | Max: 14m 05s | Hits:  98%/3103  
      🟩 90;90a;100         Pass: 100%/1   | Total:  5m 01s | Avg:  5m 01s | Max:  5m 01s | Hits:  98%/3103  
    🟩 std
      🟩 17                 Pass: 100%/22  | Total:  3h 08m | Avg:  8m 33s | Max: 24m 14s | Hits:  95%/61897 
      🟩 20                 Pass: 100%/22  | Total:  2h 55m | Avg:  7m 59s | Max: 25m 45s | Hits:  94%/54296 
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 58m | Avg: 7m 25s | Max: 14m 22s | Hits: 90%/13372

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 39m | Avg:  7m 58s | Max: 14m 22s | Hits:  90%/11044 
      🟩 arm64              Pass: 100%/4   | Total: 19m 00s | Avg:  4m 45s | Max:  5m 06s | Hits:  90%/2328  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 13m 45s | Avg: 13m 45s | Max: 13m 45s | Hits:  77%/284   
      🟩 12.8               Pass: 100%/23  | Total:  2h 44m | Avg:  7m 09s | Max: 14m 22s | Hits:  91%/13088 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 13m 45s | Avg: 13m 45s | Max: 13m 45s | Hits:  77%/284   
      🟩 nvcc12.8           Pass: 100%/23  | Total:  2h 44m | Avg:  7m 09s | Max: 14m 22s | Hits:  91%/13088 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 58m | Avg:  7m 25s | Max: 14m 22s | Hits:  90%/13372 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 06s | Avg:  5m 06s | Max:  5m 06s | Hits:  90%/584   
      🟩 Clang15            Pass: 100%/1   | Total:  5m 08s | Avg:  5m 08s | Max:  5m 08s | Hits:  90%/582   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s | Hits:  90%/582   
      🟩 Clang17            Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s | Hits:  90%/582   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s | Hits:  90%/582   
      🟩 Clang19            Pass: 100%/4   | Total: 26m 39s | Avg:  6m 39s | Max: 12m 21s | Hits:  92%/2328  
      🟩 GCC10              Pass: 100%/1   | Total:  5m 58s | Avg:  5m 58s | Max:  5m 58s | Hits:  90%/584   
      🟩 GCC11              Pass: 100%/1   | Total:  5m 54s | Avg:  5m 54s | Max:  5m 54s | Hits:  90%/582   
      🟩 GCC12              Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s | Hits:  90%/582   
      🟩 GCC13              Pass: 100%/8   | Total: 57m 32s | Avg:  7m 11s | Max: 14m 22s | Hits:  92%/4656  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 45s | Avg: 13m 45s | Max: 13m 45s | Hits:  77%/284   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 13m 45s | Avg: 13m 45s | Max: 13m 45s | Hits:  77%/284   
      🟩 NVHPC25.3          Pass: 100%/2   | Total: 22m 32s | Avg: 11m 16s | Max: 11m 26s | Hits:  88%/1160  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/9   | Total: 53m 19s | Avg:  5m 55s | Max: 12m 21s | Hits:  91%/5240  
      🟩 GCC                Pass: 100%/11  | Total:  1h 15m | Avg:  6m 49s | Max: 14m 22s | Hits:  91%/6404  
      🟩 MSVC               Pass: 100%/2   | Total: 27m 30s | Avg: 13m 45s | Max: 13m 45s | Hits:  77%/568   
      🟩 NVHPC              Pass: 100%/2   | Total: 22m 32s | Avg: 11m 16s | Max: 11m 26s | Hits:  88%/1160  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 50s | Avg:  9m 25s | Max: 14m 22s | Hits:  94%/1164  
      🟩 rtx2080            Pass: 100%/22  | Total:  2h 39m | Avg:  7m 15s | Max: 13m 45s | Hits:  90%/12208 
    🟩 jobs
      🟩 Build              Pass: 100%/21  | Total:  2h 18m | Avg:  6m 35s | Max: 13m 45s | Hits:  89%/11626 
      🟩 Test               Pass: 100%/3   | Total: 39m 48s | Avg: 13m 16s | Max: 14m 22s | Hits:  99%/1746  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 23m 41s | Avg:  7m 53s | Max: 14m 22s | Hits:  93%/1746  
      🟩 90a                Pass: 100%/1   | Total:  4m 21s | Avg:  4m 21s | Max:  4m 21s | Hits:  90%/582   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 25m 33s | Avg:  6m 23s | Max: 11m 26s | Hits:  89%/2326  
      🟩 20                 Pass: 100%/20  | Total:  2h 32m | Avg:  7m 38s | Max: 14m 22s | Hits:  91%/11046 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 19m 40s | Avg: 4m 55s | Max: 5m 57s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 10s | Avg:  5m 35s | Max:  5m 57s
      🟩 arm64              Pass: 100%/2   | Total:  8m 30s | Avg:  4m 15s | Max:  4m 17s
    🟩 ctk
      🟩 12.8               Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 cxx
      🟩 NVHPC25.3          Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 19m 40s | Avg:  4m 55s | Max:  5m 57s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  9m 30s | Avg:  4m 45s | Max:  5m 13s
      🟩 20                 Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  5m 57s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits: 96%/328

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 22m 13s | Hits:  96%/328   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 44s | Avg:  2m 44s | Max:  2m 44s | Hits:  93%/164   
      🟩 Test               Pass: 100%/1   | Total: 22m 13s | Avg: 22m 13s | Max: 22m 13s | Hits:  98%/164   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 37m | Avg: 1h 37m | Max: 1h 37m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 37m | Avg:  1h 37m | Max:  1h 37m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 170)

# Runner
121 linux-amd64-cpu16
15 windows-amd64-cpu16
12 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

}

_Up __current{0};
_Up __next{_Up(_Up{1} << ((_CUDA_VSTD::bit_width(_Up(__v - 1)) + 1) / 2))};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniform initialization with dynamic values is a GCC extension. Also, I would prefere static_cast over C-style cast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit_width returns an int. We should convert it to unsigned to make the division more efficient

Comment on lines 54 to 64
_Up __current{0};
_Up __next{_Up(_Up{1} << ((_CUDA_VSTD::bit_width(_Up(__v - 1)) + 1) / 2))};

do
{
__current = __next;
__next = _Up((__current + _Up(__v) / __current) / 2);
} while (__next < __current);

return static_cast<_Tp>(__current);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's tricky. Conversion to float is worth it only if --prec-sqrt=false is used...on the other hand, this cannot be detected. Need to check with the compiler team

@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Apr 14, 2025
Comment on lines 54 to 64
_Up __current{0};
_Up __next{_Up(_Up{1} << ((_CUDA_VSTD::bit_width(_Up(__v - 1)) + 1) / 2))};

do
{
__current = __next;
__next = _Up((__current + _Up(__v) / __current) / 2);
} while (__next < __current);

return static_cast<_Tp>(__current);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other two notes: PTX sqrt can be used to directly use the fast math (approx) mode. We need to be very careful on number >= 2^23 because of the loss of precision in the conversion to floating point.

@fbusato
Copy link
Contributor

fbusato commented Apr 15, 2025

please also check the developer forum discussion https://forums.developer.nvidia.com/t/integer-square-root/198642

@davebayer davebayer requested a review from fbusato April 16, 2025 06:05
@miscco
Copy link
Contributor

miscco commented Apr 16, 2025

/ok to test 9ed5906

Copy link
Contributor

🟨 CI finished in 2h 04m: Pass: 74%/170 | Total: 3d 17h | Avg: 31m 25s | Max: 1h 42m | Hits: 49%/153708
  • 🟨 libcudacxx: Pass: 2%/45 | Total: 5h 41m | Avg: 7m 35s | Max: 28m 25s

    🟨 jobs
      🟥 Build              Pass:   0%/39  | Total:  5h 02m | Avg:  7m 45s | Max: 28m 25s
      🟥 NVRTC              Pass:   0%/2   | Total: 36m 20s | Avg: 18m 10s | Max: 20m 28s
      🟥 Test               Pass:   0%/3  
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 21s | Avg:  2m 21s | Max:  2m 21s
    🟨 cpu
      🟨 amd64              Pass:   2%/43  | Total:  5h 32m | Avg:  7m 44s | Max: 28m 25s
      🟥 arm64              Pass:   0%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 43s
    🟨 ctk
      🟥 12.0               Pass:   0%/5   | Total: 36m 04s | Avg:  7m 12s | Max: 19m 56s
      🟨 12.8               Pass:   2%/40  | Total:  5h 05m | Avg:  7m 38s | Max: 28m 25s
    🟨 cudacxx
      🟥 ClangCUDA19        Pass:   0%/2   | Total: 45m 21s | Avg: 22m 40s | Max: 24m 01s
      🟥 nvcc12.0           Pass:   0%/5   | Total: 36m 04s | Avg:  7m 12s | Max: 19m 56s
      🟨 nvcc12.8           Pass:   2%/38  | Total:  4h 20m | Avg:  6m 50s | Max: 28m 25s
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total: 45m 21s | Avg: 22m 40s | Max: 24m 01s
      🟨 nvcc               Pass:   2%/43  | Total:  4h 56m | Avg:  6m 53s | Max: 28m 25s
    🟨 cxx
      🟥 Clang14            Pass:   0%/4   | Total: 18m 09s | Avg:  4m 32s | Max:  4m 57s
      🟥 Clang15            Pass:   0%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 54s
      🟥 Clang16            Pass:   0%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 23s
      🟥 Clang17            Pass:   0%/2   | Total: 10m 20s | Avg:  5m 10s | Max:  5m 24s
      🟥 Clang18            Pass:   0%/2   | Total: 10m 09s | Avg:  5m 04s | Max:  5m 19s
      🟥 Clang19            Pass:   0%/6   | Total: 59m 40s | Avg:  9m 56s | Max: 24m 01s
      🟥 GCC7               Pass:   0%/2   | Total:  8m 05s | Avg:  4m 02s | Max:  4m 15s
      🟥 GCC8               Pass:   0%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s
      🟥 GCC9               Pass:   0%/2   | Total:  8m 23s | Avg:  4m 11s | Max:  4m 31s
      🟥 GCC10              Pass:   0%/2   | Total:  9m 24s | Avg:  4m 42s | Max:  4m 54s
      🟥 GCC11              Pass:   0%/2   | Total:  8m 37s | Avg:  4m 18s | Max:  4m 26s
      🟥 GCC12              Pass:   0%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  5m 12s
      🟨 GCC13              Pass:  10%/10  | Total:  1h 00m | Avg:  6m 05s | Max: 20m 28s
      🟥 MSVC14.29          Pass:   0%/2   | Total: 40m 40s | Avg: 20m 20s | Max: 20m 44s
      🟥 MSVC14.42          Pass:   0%/2   | Total: 52m 28s | Avg: 26m 14s | Max: 28m 25s
      🟥 NVHPC25.3          Pass:   0%/2   | Total: 20m 59s | Avg: 10m 29s | Max: 10m 52s
    🟨 cxx_family
      🟥 Clang              Pass:   0%/18  | Total:  1h 58m | Avg:  6m 34s | Max: 24m 01s
      🟨 GCC                Pass:   4%/21  | Total:  1h 49m | Avg:  5m 11s | Max: 20m 28s
      🟥 MSVC               Pass:   0%/4   | Total:  1h 33m | Avg: 23m 17s | Max: 28m 25s
      🟥 NVHPC              Pass:   0%/2   | Total: 20m 59s | Avg: 10m 29s | Max: 10m 52s
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  4m 30s | Avg:  2m 15s | Max:  4m 30s
      🟨 rtx2080            Pass:   2%/43  | Total:  5h 37m | Avg:  7m 50s | Max: 28m 25s
    🟥 sm
      🟥 75                 Pass:   0%/2   | Total: 36m 20s | Avg: 18m 10s | Max: 20m 28s
      🟥 90                 Pass:   0%/2   | Total:  4m 30s | Avg:  2m 15s | Max:  4m 30s
      🟥 90;90a;100         Pass:   0%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
    🟥 std
      🟥 17                 Pass:   0%/22  | Total:  3h 10m | Avg:  8m 38s | Max: 24m 03s
      🟥 20                 Pass:   0%/22  | Total:  2h 29m | Avg:  6m 46s | Max: 28m 25s
    
  • 🟩 cub: Pass: 100%/47 | Total: 2d 01h | Avg: 1h 02m | Max: 1h 42m | Hits: 30%/56545

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 22h | Avg:  1h 02m | Max:  1h 42m | Hits:  30%/54087 
      🟩 arm64              Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m | Hits:  16%/2458  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  6h 06m | Avg:  1h 13m | Max:  1h 20m | Hits:  15%/5974  
      🟩 12.8               Pass: 100%/42  | Total:  1d 19h | Avg:  1h 01m | Max:  1h 42m | Hits:  32%/50571 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  15%/2120  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  6h 06m | Avg:  1h 13m | Max:  1h 20m | Hits:  15%/5974  
      🟩 nvcc12.8           Pass: 100%/40  | Total:  1d 17h | Avg:  1h 01m | Max:  1h 42m | Hits:  32%/48451 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  15%/2120  
      🟩 nvcc               Pass: 100%/45  | Total:  1d 23h | Avg:  1h 02m | Max:  1h 42m | Hits:  30%/54425 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 25m | Avg:  1h 06m | Max:  1h 09m | Hits:  16%/4924  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 11m | Hits:  16%/2458  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 06m | Hits:  16%/2458  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 04m | Hits:  16%/2458  
      🟩 Clang18            Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 10m | Hits:  16%/2458  
      🟩 Clang19            Pass: 100%/7   | Total:  6h 20m | Avg: 54m 24s | Max:  1h 12m | Hits:  41%/8265  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m | Hits:  16%/2462  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m | Hits:  16%/1231  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 13m | Hits:  16%/2462  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m | Hits:  16%/2462  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  15%/2458  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 11m | Hits:  15%/2458  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 44m | Avg: 42m 16s | Max:  1h 16m | Hits:  61%/13519 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 53m | Avg:  1h 26m | Max:  1h 33m | Hits:  12%/2100  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  3h 06m | Avg:  1h 33m | Max:  1h 42m | Hits:  12%/2100  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  2h 48m | Avg:  1h 24m | Max:  1h 31m | Hits:  12%/2272  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 19h 43m | Avg:  1h 02m | Max:  1h 12m | Hits:  25%/23021 
      🟩 GCC                Pass: 100%/22  | Total: 20h 39m | Avg: 56m 21s | Max:  1h 16m | Hits:  38%/27052 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 59m | Avg:  1h 29m | Max:  1h 42m | Hits:  12%/4200  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 48m | Avg:  1h 24m | Max:  1h 31m | Hits:  12%/2272  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 20m | Avg: 26m 48s | Max: 29m 31s | Hits:  71%/3687  
      🟩 rtx2080            Pass: 100%/36  | Total:  1d 19h | Avg:  1h 11m | Max:  1h 42m | Hits:  15%/43026 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 51m | Avg: 36m 25s | Max:  1h 16m | Hits:  78%/9832  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 21h | Avg:  1h 10m | Max:  1h 42m | Hits:  15%/46713 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 26m 28s | Avg: 26m 28s | Max: 26m 28s | Hits:  99%/1229  
      🟩 GraphCapture       Pass: 100%/1   | Total: 21m 33s | Avg: 21m 33s | Max: 21m 33s | Hits:  99%/1229  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 21m | Avg: 27m 16s | Max: 28m 20s | Hits:  99%/3687  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 10m | Avg: 23m 33s | Max: 23m 42s | Hits:  99%/3687  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 20m | Avg: 26m 48s | Max: 29m 31s | Hits:  71%/3687  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 13m | Avg:  1h 13m | Max:  1h 13m | Hits:  15%/1229  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  1d 01h | Avg:  1h 13m | Max:  1h 42m | Hits:  15%/25026 
      🟩 20                 Pass: 100%/26  | Total: 23h 31m | Avg: 54m 18s | Max:  1h 23m | Hits:  42%/31519 
    
  • 🟩 thrust: Pass: 100%/47 | Total: 1d 04h | Avg: 36m 58s | Max: 1h 33m | Hits: 56%/83463

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 42m 32s | Avg: 21m 16s | Max: 31m 34s | Hits:  73%/3554  
    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 03h | Avg: 37m 12s | Max:  1h 33m | Hits:  56%/79910 
      🟩 arm64              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 25s | Max: 32m 41s | Hits:  47%/3553  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 36m | Avg: 43m 14s | Max:  1h 14m | Hits:  50%/8876  
      🟩 12.8               Pass: 100%/42  | Total:  1d 01h | Avg: 36m 13s | Max:  1h 33m | Hits:  56%/74587 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 00m | Avg: 30m 27s | Max: 30m 58s | Hits:  48%/3552  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 36m | Avg: 43m 14s | Max:  1h 14m | Hits:  50%/8876  
      🟩 nvcc12.8           Pass: 100%/40  | Total:  1d 00h | Avg: 36m 30s | Max:  1h 33m | Hits:  57%/71035 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 27s | Max: 30m 58s | Hits:  48%/3552  
      🟩 nvcc               Pass: 100%/45  | Total:  1d 03h | Avg: 37m 15s | Max:  1h 33m | Hits:  56%/79911 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 41s | Max: 36m 40s | Hits:  58%/7104  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 14m | Avg: 37m 01s | Max: 38m 51s | Hits:  48%/3552  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 49s | Max: 35m 35s | Hits:  48%/3552  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 12m | Avg: 36m 06s | Max: 37m 52s | Hits:  48%/3552  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 14m | Avg: 37m 01s | Max: 37m 19s | Hits:  48%/3552  
      🟩 Clang19            Pass: 100%/7   | Total:  3h 00m | Avg: 25m 43s | Max: 38m 46s | Hits:  64%/12432 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 18s | Max: 35m 29s | Hits:  48%/3554  
      🟩 GCC8               Pass: 100%/1   | Total: 34m 18s | Avg: 34m 18s | Max: 34m 18s | Hits:  47%/1777  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 14m | Avg: 37m 09s | Max: 37m 53s | Hits:  61%/3554  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 44s | Max: 37m 53s | Hits:  47%/3554  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 16m | Avg: 38m 13s | Max: 41m 56s | Hits:  47%/3554  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 38s | Max: 36m 06s | Hits:  47%/3554  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 56m | Avg: 23m 40s | Max: 37m 56s | Hits:  74%/17770 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 27m | Avg:  1h 13m | Max:  1h 14m | Hits:  38%/3540  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 46m | Avg: 55m 37s | Max:  1h 13m | Hits:  50%/5310  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  3h 00m | Avg:  1h 30m | Max:  1h 33m | Hits:  25%/3552  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 10h 08m | Avg: 32m 02s | Max: 38m 51s | Hits:  56%/33744 
      🟩 GCC                Pass: 100%/21  | Total: 10h 33m | Avg: 30m 09s | Max: 41m 56s | Hits:  61%/37317 
      🟩 MSVC               Pass: 100%/5   | Total:  5h 14m | Avg:  1h 02m | Max:  1h 14m | Hits:  45%/8850  
      🟩 NVHPC              Pass: 100%/2   | Total:  3h 00m | Avg:  1h 30m | Max:  1h 33m | Hits:  25%/3552  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 38m 17s | Avg: 19m 08s | Max: 26m 20s | Hits:  73%/3554  
      🟩 rtx2080            Pass: 100%/35  | Total:  1d 00h | Avg: 41m 28s | Max:  1h 33m | Hits:  48%/62156 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 07m | Avg: 24m 46s | Max:  1h 13m | Hits:  80%/17753 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 03h | Avg: 41m 10s | Max:  1h 33m | Hits:  48%/71033 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 58s | Avg: 15m 39s | Max: 31m 23s | Hits:  99%/5323  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 42s | Avg: 10m 55s | Max: 11m 57s | Hits:  99%/7107  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 38m 17s | Avg: 19m 08s | Max: 26m 20s | Hits:  73%/3554  
      🟩 90;90a;100         Pass: 100%/1   | Total: 35m 13s | Avg: 35m 13s | Max: 35m 13s | Hits:  75%/1777  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 15h 21m | Avg: 43m 52s | Max:  1h 33m | Hits:  47%/37287 
      🟩 20                 Pass: 100%/24  | Total: 12h 53m | Avg: 32m 13s | Max:  1h 27m | Hits:  62%/42622 
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 50m | Avg: 7m 06s | Max: 14m 00s | Hits: 90%/13372

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 31m | Avg:  7m 34s | Max: 14m 00s | Hits:  90%/11044 
      🟩 arm64              Pass: 100%/4   | Total: 19m 16s | Avg:  4m 49s | Max:  5m 06s | Hits:  90%/2328  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 16s | Avg: 12m 16s | Max: 12m 16s | Hits:  77%/284   
      🟩 12.8               Pass: 100%/23  | Total:  2h 38m | Avg:  6m 53s | Max: 14m 00s | Hits:  90%/13088 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 16s | Avg: 12m 16s | Max: 12m 16s | Hits:  77%/284   
      🟩 nvcc12.8           Pass: 100%/23  | Total:  2h 38m | Avg:  6m 53s | Max: 14m 00s | Hits:  90%/13088 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 50m | Avg:  7m 06s | Max: 14m 00s | Hits:  90%/13372 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s | Hits:  90%/584   
      🟩 Clang15            Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s | Hits:  90%/582   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s | Hits:  90%/582   
      🟩 Clang17            Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s | Hits:  90%/582   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s | Hits:  90%/582   
      🟩 Clang19            Pass: 100%/4   | Total: 26m 27s | Avg:  6m 36s | Max: 11m 59s | Hits:  92%/2328  
      🟩 GCC10              Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s | Hits:  90%/584   
      🟩 GCC11              Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s | Hits:  90%/582   
      🟩 GCC12              Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s | Hits:  90%/582   
      🟩 GCC13              Pass: 100%/8   | Total: 56m 03s | Avg:  7m 00s | Max: 14m 00s | Hits:  92%/4656  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 16s | Avg: 12m 16s | Max: 12m 16s | Hits:  77%/284   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 12m 17s | Avg: 12m 17s | Max: 12m 17s | Hits:  57%/284   
      🟩 NVHPC25.3          Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max:  9m 57s | Hits:  88%/1160  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/9   | Total: 53m 21s | Avg:  5m 55s | Max: 11m 59s | Hits:  91%/5240  
      🟩 GCC                Pass: 100%/11  | Total:  1h 12m | Avg:  6m 37s | Max: 14m 00s | Hits:  91%/6404  
      🟩 MSVC               Pass: 100%/2   | Total: 24m 33s | Avg: 12m 16s | Max: 12m 17s | Hits:  67%/568   
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max:  9m 57s | Hits:  88%/1160  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 23s | Avg:  9m 11s | Max: 14m 00s | Hits:  95%/1164  
      🟩 rtx2080            Pass: 100%/22  | Total:  2h 32m | Avg:  6m 55s | Max: 12m 59s | Hits:  90%/12208 
    🟩 jobs
      🟩 Build              Pass: 100%/21  | Total:  2h 11m | Avg:  6m 16s | Max: 12m 17s | Hits:  89%/11626 
      🟩 Test               Pass: 100%/3   | Total: 38m 58s | Avg: 12m 59s | Max: 14m 00s | Hits:  99%/1746  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 22m 41s | Avg:  7m 33s | Max: 14m 00s | Hits:  93%/1746  
      🟩 90a                Pass: 100%/1   | Total:  4m 46s | Avg:  4m 46s | Max:  4m 46s | Hits:  90%/582   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 23m 44s | Avg:  5m 56s | Max:  9m 57s | Hits:  89%/2326  
      🟩 20                 Pass: 100%/20  | Total:  2h 26m | Avg:  7m 20s | Max: 14m 00s | Hits:  90%/11046 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 19m 34s | Avg: 4m 53s | Max: 5m 15s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  5m 15s
      🟩 arm64              Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  4m 42s
    🟩 ctk
      🟩 12.8               Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 cxx
      🟩 NVHPC25.3          Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 19m 34s | Avg:  4m 53s | Max:  5m 15s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  9m 57s | Avg:  4m 58s | Max:  5m 15s
      🟩 20                 Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  5m 08s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits: 96%/328

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 24m 19s | Avg: 12m 09s | Max: 21m 37s | Hits:  96%/328   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 42s | Avg:  2m 42s | Max:  2m 42s | Hits:  93%/164   
      🟩 Test               Pass: 100%/1   | Total: 21m 37s | Avg: 21m 37s | Max: 21m 37s | Hits:  98%/164   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 35m | Avg: 1h 35m | Max: 1h 35m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 35m | Avg:  1h 35m | Max:  1h 35m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 170)

# Runner
121 linux-amd64-cpu16
15 windows-amd64-cpu16
12 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@miscco
Copy link
Contributor

miscco commented Apr 16, 2025

/ok to test 6fd7821

Copy link
Contributor

🟩 CI finished in 2h 26m: Pass: 100%/170 | Total: 3d 18h | Avg: 32m 05s | Max: 1h 34m | Hits: 69%/269901
  • 🟩 cub: Pass: 100%/47 | Total: 2d 02h | Avg: 1h 04m | Max: 1h 34m | Hits: 30%/56545

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 23h | Avg:  1h 03m | Max:  1h 34m | Hits:  30%/54087 
      🟩 arm64              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 08m | Hits:  16%/2458  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 55m | Avg:  1h 11m | Max:  1h 25m | Hits:  15%/5974  
      🟩 12.8               Pass: 100%/42  | Total:  1d 20h | Avg:  1h 03m | Max:  1h 34m | Hits:  32%/50571 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  15%/2120  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 55m | Avg:  1h 11m | Max:  1h 25m | Hits:  15%/5974  
      🟩 nvcc12.8           Pass: 100%/40  | Total:  1d 18h | Avg:  1h 03m | Max:  1h 34m | Hits:  32%/48451 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  15%/2120  
      🟩 nvcc               Pass: 100%/45  | Total:  2d 00h | Avg:  1h 04m | Max:  1h 34m | Hits:  30%/54425 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 24m | Avg:  1h 06m | Max:  1h 07m | Hits:  16%/4924  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m | Hits:  16%/2458  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 08m | Hits:  16%/2458  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 19m | Hits:  16%/2458  
      🟩 Clang18            Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 16m | Hits:  16%/2458  
      🟩 Clang19            Pass: 100%/7   | Total:  6h 26m | Avg: 55m 09s | Max:  1h 16m | Hits:  41%/8265  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 18m | Hits:  16%/2462  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 16m | Avg:  1h 16m | Max:  1h 16m | Hits:  16%/1231  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 18m | Hits:  16%/2462  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 19m | Hits:  16%/2462  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m | Hits:  15%/2458  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 20m | Hits:  15%/2458  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 44m | Avg: 42m 13s | Max:  1h 15m | Hits:  61%/13519 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 43m | Avg:  1h 21m | Max:  1h 25m | Hits:  12%/2100  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  3h 04m | Avg:  1h 32m | Max:  1h 34m | Hits:  12%/2100  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  2h 44m | Avg:  1h 22m | Max:  1h 25m | Hits:  12%/2272  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 20h 11m | Avg:  1h 03m | Max:  1h 19m | Hits:  25%/23021 
      🟩 GCC                Pass: 100%/22  | Total: 21h 30m | Avg: 58m 39s | Max:  1h 20m | Hits:  38%/27052 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 47m | Avg:  1h 26m | Max:  1h 34m | Hits:  12%/4200  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 44m | Avg:  1h 22m | Max:  1h 25m | Hits:  12%/2272  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 21m | Avg: 27m 19s | Max: 31m 51s | Hits:  71%/3687  
      🟩 rtx2080            Pass: 100%/36  | Total:  1d 20h | Avg:  1h 13m | Max:  1h 34m | Hits:  15%/43026 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 42m | Avg: 35m 21s | Max:  1h 08m | Hits:  78%/9832  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 22h | Avg:  1h 12m | Max:  1h 34m | Hits:  15%/46713 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 27m 55s | Avg: 27m 55s | Max: 27m 55s | Hits:  99%/1229  
      🟩 GraphCapture       Pass: 100%/1   | Total: 21m 09s | Avg: 21m 09s | Max: 21m 09s | Hits:  99%/1229  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 22m | Avg: 27m 28s | Max: 28m 52s | Hits:  99%/3687  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 08m | Avg: 22m 59s | Max: 23m 36s | Hits:  99%/3687  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 21m | Avg: 27m 19s | Max: 31m 51s | Hits:  71%/3687  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 12m | Avg:  1h 12m | Max:  1h 12m | Hits:  15%/1229  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  1d 01h | Avg:  1h 13m | Max:  1h 29m | Hits:  15%/25026 
      🟩 20                 Pass: 100%/26  | Total:  1d 00h | Avg: 56m 20s | Max:  1h 34m | Hits:  42%/31519 
    
  • 🟩 thrust: Pass: 100%/47 | Total: 1d 05h | Avg: 37m 06s | Max: 1h 29m | Hits: 56%/83463

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 45m 32s | Avg: 22m 46s | Max: 34m 10s | Hits:  73%/3554  
    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 03h | Avg: 37m 19s | Max:  1h 29m | Hits:  56%/79910 
      🟩 arm64              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 13s | Max: 33m 52s | Hits:  47%/3553  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 26m | Avg: 41m 22s | Max:  1h 05m | Hits:  51%/8876  
      🟩 12.8               Pass: 100%/42  | Total:  1d 01h | Avg: 36m 36s | Max:  1h 29m | Hits:  56%/74587 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 04m | Avg: 32m 03s | Max: 33m 32s | Hits:  48%/3552  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 26m | Avg: 41m 22s | Max:  1h 05m | Hits:  51%/8876  
      🟩 nvcc12.8           Pass: 100%/40  | Total:  1d 00h | Avg: 36m 50s | Max:  1h 29m | Hits:  57%/71035 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 03s | Max: 33m 32s | Hits:  48%/3552  
      🟩 nvcc               Pass: 100%/45  | Total:  1d 04h | Avg: 37m 20s | Max:  1h 29m | Hits:  56%/79911 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 19m | Avg: 34m 51s | Max: 36m 08s | Hits:  56%/7104  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 43s | Max: 38m 47s | Hits:  48%/3552  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 12m | Avg: 36m 13s | Max: 38m 37s | Hits:  48%/3552  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 11m | Avg: 35m 36s | Max: 36m 26s | Hits:  48%/3552  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 07s | Max: 33m 34s | Hits:  48%/3552  
      🟩 Clang19            Pass: 100%/7   | Total:  2h 57m | Avg: 25m 21s | Max: 33m 32s | Hits:  64%/12432 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 41s | Max: 35m 24s | Hits:  53%/3554  
      🟩 GCC8               Pass: 100%/1   | Total: 40m 42s | Avg: 40m 42s | Max: 40m 42s | Hits:  47%/1777  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 55s | Max: 35m 40s | Hits:  55%/3554  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 25s | Max: 35m 25s | Hits:  47%/3554  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 19m | Avg: 39m 38s | Max: 40m 22s | Hits:  47%/3554  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 23m | Avg: 41m 36s | Max: 45m 55s | Hits:  47%/3554  
      🟩 GCC13              Pass: 100%/10  | Total:  4h 05m | Avg: 24m 30s | Max: 42m 29s | Hits:  74%/17770 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  41%/3540  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  3h 03m | Avg:  1h 01m | Max:  1h 22m | Hits:  50%/5310  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  2h 55m | Avg:  1h 27m | Max:  1h 29m | Hits:  25%/3552  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 10h 00m | Avg: 31m 35s | Max: 38m 47s | Hits:  55%/33744 
      🟩 GCC                Pass: 100%/21  | Total: 10h 58m | Avg: 31m 20s | Max: 45m 55s | Hits:  61%/37317 
      🟩 MSVC               Pass: 100%/5   | Total:  5h 10m | Avg:  1h 02m | Max:  1h 22m | Hits:  46%/8850  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 55m | Avg:  1h 27m | Max:  1h 29m | Hits:  25%/3552  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 46s | Avg: 15m 53s | Max: 20m 44s | Hits:  87%/3554  
      🟩 rtx2080            Pass: 100%/35  | Total:  1d 00h | Avg: 41m 26s | Max:  1h 29m | Hits:  47%/62156 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 22m | Avg: 26m 13s | Max:  1h 22m | Hits:  80%/17753 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 03h | Avg: 41m 25s | Max:  1h 29m | Hits:  48%/71033 
      🟩 TestCPU            Pass: 100%/3   | Total: 43m 50s | Avg: 14m 36s | Max: 28m 16s | Hits:  99%/5323  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 37s | Avg: 10m 54s | Max: 11m 22s | Hits:  99%/7107  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 46s | Avg: 15m 53s | Max: 20m 44s | Hits:  87%/3554  
      🟩 90;90a;100         Pass: 100%/1   | Total: 42m 29s | Avg: 42m 29s | Max: 42m 29s | Hits:  47%/1777  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 15h 06m | Avg: 43m 11s | Max:  1h 29m | Hits:  47%/37287 
      🟩 20                 Pass: 100%/24  | Total: 13h 11m | Avg: 32m 59s | Max:  1h 25m | Hits:  62%/42622 
    
  • 🟩 libcudacxx: Pass: 100%/45 | Total: 6h 19m | Avg: 8m 25s | Max: 27m 38s | Hits: 94%/116193

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 11m | Avg:  8m 38s | Max: 27m 38s | Hits:  94%/110166
      🟩 arm64              Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  4m 10s | Hits:  98%/6027  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 41m 43s | Avg:  8m 20s | Max: 26m 03s | Hits:  99%/14685 
      🟩 12.8               Pass: 100%/40  | Total:  5h 37m | Avg:  8m 26s | Max: 27m 38s | Hits:  94%/101508
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 47m 32s | Avg: 23m 46s | Max: 25m 35s | Hits:  27%/5987  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 41m 43s | Avg:  8m 20s | Max: 26m 03s | Hits:  99%/14685 
      🟩 nvcc12.8           Pass: 100%/38  | Total:  4h 50m | Avg:  7m 38s | Max: 27m 38s | Hits:  98%/95521 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 32s | Avg: 23m 46s | Max: 25m 35s | Hits:  27%/5987  
      🟩 nvcc               Pass: 100%/43  | Total:  5h 31m | Avg:  7m 43s | Max: 27m 38s | Hits:  98%/110206
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 17m 31s | Avg:  4m 22s | Max:  4m 43s | Hits:  98%/11942 
      🟩 Clang15            Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s | Hits:  99%/5983  
      🟩 Clang16            Pass: 100%/2   | Total:  9m 21s | Avg:  4m 40s | Max:  4m 43s | Hits:  99%/5983  
      🟩 Clang17            Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  5m 02s | Hits:  99%/5983  
      🟩 Clang18            Pass: 100%/2   | Total:  9m 26s | Avg:  4m 43s | Max:  4m 44s | Hits:  99%/5983  
      🟩 Clang19            Pass: 100%/6   | Total:  1h 11m | Avg: 11m 56s | Max: 25m 35s | Hits:  70%/14983 
      🟩 GCC7               Pass: 100%/2   | Total:  7m 56s | Avg:  3m 58s | Max:  4m 17s | Hits:  99%/5919  
      🟩 GCC8               Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s | Hits:  99%/2970  
      🟩 GCC9               Pass: 100%/2   | Total:  8m 10s | Avg:  4m 05s | Max:  4m 16s | Hits:  99%/5931  
      🟩 GCC10              Pass: 100%/2   | Total:  8m 11s | Avg:  4m 05s | Max:  4m 06s | Hits:  98%/5989  
      🟩 GCC11              Pass: 100%/2   | Total:  8m 43s | Avg:  4m 21s | Max:  4m 39s | Hits:  98%/5985  
      🟩 GCC12              Pass: 100%/2   | Total:  8m 24s | Avg:  4m 12s | Max:  4m 13s | Hits:  99%/5985  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 24m | Avg:  8m 26s | Max: 20m 51s | Hits:  98%/15245 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 35s | Avg: 23m 47s | Max: 26m 03s | Hits:  98%/5633  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 54m 49s | Avg: 27m 24s | Max: 27m 38s | Hits:  88%/5706  
      🟩 NVHPC25.3          Pass: 100%/2   | Total: 20m 09s | Avg: 10m 04s | Max: 10m 05s | Hits:  98%/5973  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  2h 06m | Avg:  7m 03s | Max: 25m 35s | Hits:  90%/50857 
      🟩 GCC                Pass: 100%/21  | Total:  2h 09m | Avg:  6m 11s | Max: 20m 51s | Hits:  99%/48024 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 42m | Avg: 25m 36s | Max: 27m 38s | Hits:  93%/11339 
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 09s | Avg: 10m 04s | Max: 10m 05s | Hits:  98%/5973  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 25s | Avg:  9m 12s | Max: 14m 10s | Hits:  98%/3103  
      🟩 rtx2080            Pass: 100%/43  | Total:  6h 00m | Avg:  8m 23s | Max: 27m 38s | Hits:  94%/113090
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  5h 07m | Avg:  7m 53s | Max: 27m 38s | Hits:  94%/116153
      🟩 NVRTC              Pass: 100%/2   | Total: 36m 20s | Avg: 18m 10s | Max: 20m 51s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 33m 15s | Avg: 11m 05s | Max: 14m 10s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 36m 20s | Avg: 18m 10s | Max: 20m 51s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 18m 25s | Avg:  9m 12s | Max: 14m 10s | Hits:  98%/3103  
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s | Hits:  98%/3103  
    🟩 std
      🟩 17                 Pass: 100%/22  | Total:  3h 17m | Avg:  8m 58s | Max: 27m 11s | Hits:  94%/61897 
      🟩 20                 Pass: 100%/22  | Total:  2h 59m | Avg:  8m 09s | Max: 27m 38s | Hits:  94%/54296 
    
  • 🟩 cudax: Pass: 100%/24 | Total: 3h 02m | Avg: 7m 36s | Max: 16m 35s | Hits: 90%/13372

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 43m | Avg:  8m 09s | Max: 16m 35s | Hits:  91%/11044 
      🟩 arm64              Pass: 100%/4   | Total: 19m 20s | Avg:  4m 50s | Max:  5m 07s | Hits:  90%/2328  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 48s | Avg: 11m 48s | Max: 11m 48s | Hits:  77%/284   
      🟩 12.8               Pass: 100%/23  | Total:  2h 50m | Avg:  7m 25s | Max: 16m 35s | Hits:  91%/13088 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 48s | Avg: 11m 48s | Max: 11m 48s | Hits:  77%/284   
      🟩 nvcc12.8           Pass: 100%/23  | Total:  2h 50m | Avg:  7m 25s | Max: 16m 35s | Hits:  91%/13088 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  3h 02m | Avg:  7m 36s | Max: 16m 35s | Hits:  90%/13372 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s | Hits:  90%/584   
      🟩 Clang15            Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s | Hits:  90%/582   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s | Hits:  90%/582   
      🟩 Clang17            Pass: 100%/1   | Total:  6m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  90%/582   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s | Hits:  90%/582   
      🟩 Clang19            Pass: 100%/4   | Total: 28m 27s | Avg:  7m 06s | Max: 13m 22s | Hits:  92%/2328  
      🟩 GCC10              Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s | Hits:  90%/584   
      🟩 GCC11              Pass: 100%/1   | Total:  5m 24s | Avg:  5m 24s | Max:  5m 24s | Hits:  90%/582   
      🟩 GCC12              Pass: 100%/1   | Total:  6m 11s | Avg:  6m 11s | Max:  6m 11s | Hits:  90%/582   
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 37s | Max: 16m 35s | Hits:  92%/4656  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 48s | Avg: 11m 48s | Max: 11m 48s | Hits:  77%/284   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 14m 52s | Avg: 14m 52s | Max: 14m 52s | Hits:  78%/284   
      🟩 NVHPC25.3          Pass: 100%/2   | Total: 21m 15s | Avg: 10m 37s | Max: 11m 05s | Hits:  88%/1160  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/9   | Total: 56m 12s | Avg:  6m 14s | Max: 13m 22s | Hits:  91%/5240  
      🟩 GCC                Pass: 100%/11  | Total:  1h 18m | Avg:  7m 07s | Max: 16m 35s | Hits:  91%/6404  
      🟩 MSVC               Pass: 100%/2   | Total: 26m 40s | Avg: 13m 20s | Max: 14m 52s | Hits:  77%/568   
      🟩 NVHPC              Pass: 100%/2   | Total: 21m 15s | Avg: 10m 37s | Max: 11m 05s | Hits:  88%/1160  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max: 14m 11s | Hits:  95%/1164  
      🟩 rtx2080            Pass: 100%/22  | Total:  2h 43m | Avg:  7m 26s | Max: 16m 35s | Hits:  90%/12208 
    🟩 jobs
      🟩 Build              Pass: 100%/21  | Total:  2h 18m | Avg:  6m 35s | Max: 14m 52s | Hits:  89%/11626 
      🟩 Test               Pass: 100%/3   | Total: 44m 08s | Avg: 14m 42s | Max: 16m 35s | Hits:  99%/1746  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 23m 33s | Avg:  7m 51s | Max: 14m 11s | Hits:  93%/1746  
      🟩 90a                Pass: 100%/1   | Total:  4m 44s | Avg:  4m 44s | Max:  4m 44s | Hits:  90%/582   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 25m 28s | Avg:  6m 22s | Max: 11m 05s | Hits:  89%/2326  
      🟩 20                 Pass: 100%/20  | Total:  2h 37m | Avg:  7m 51s | Max: 16m 35s | Hits:  91%/11046 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 17m 55s | Avg: 4m 28s | Max: 5m 11s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 19s | Avg:  5m 09s | Max:  5m 11s
      🟩 arm64              Pass: 100%/2   | Total:  7m 36s | Avg:  3m 48s | Max:  3m 49s
    🟩 ctk
      🟩 12.8               Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 cxx
      🟩 NVHPC25.3          Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 17m 55s | Avg:  4m 28s | Max:  5m 11s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  5m 11s
      🟩 20                 Pass: 100%/2   | Total:  8m 55s | Avg:  4m 27s | Max:  5m 08s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits: 96%/328

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 24m 31s | Avg: 12m 15s | Max: 21m 45s | Hits:  96%/328   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s | Hits:  93%/164   
      🟩 Test               Pass: 100%/1   | Total: 21m 45s | Avg: 21m 45s | Max: 21m 45s | Hits:  98%/164   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 32m | Avg: 1h 32m | Max: 1h 32m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 32m | Avg:  1h 32m | Max:  1h 32m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 170)

# Runner
121 linux-amd64-cpu16
15 windows-amd64-cpu16
12 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants