Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes DispatchScan[ByKey] documentation to advise using unsigned offset types #3111

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

elstehle
Copy link
Collaborator

Description

#2171 introduced support for large problem sizes in DeviceScan and DeviceScanByKey. With that change, we also switched to using unsigned offset types using the choose_offset_t utility.

In that PR I forgot to adapt the documentation of the corresponding Dispatch class templates, which still suggest using signed offset types. This PR fixes that documentation.

Thanks to @bernhardmgruber for pointing it out.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

🟩 CI finished in 1h 14m: Pass: 100%/94 | Total: 14h 55m | Avg: 9m 31s | Max: 53m 13s | Hits: 99%/12288
  • 🟩 thrust: Pass: 100%/46 | Total: 7h 07m | Avg: 9m 17s | Max: 36m 35s | Hits: 99%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 28m 07s | Avg: 14m 03s | Max: 22m 19s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  6h 57m | Avg:  9m 29s | Max: 36m 35s | Hits:  99%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 17s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total: 48m 00s | Avg:  6m 51s | Max: 22m 58s | Hits:  99%/1852  
      🟩 12.5               Pass: 100%/2   | Total: 27m 42s | Avg: 13m 51s | Max: 13m 59s
      🟩 12.6               Pass: 100%/37  | Total:  5h 51m | Avg:  9m 30s | Max: 36m 35s | Hits:  99%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 44s | Avg:  4m 52s | Max:  4m 52s
      🟩 nvcc11.1           Pass: 100%/7   | Total: 48m 00s | Avg:  6m 51s | Max: 22m 58s | Hits:  99%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 27m 42s | Avg: 13m 51s | Max: 13m 59s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  5h 42m | Avg:  9m 46s | Max: 36m 35s | Hits:  99%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 44s | Avg:  4m 52s | Max:  4m 52s
      🟩 nvcc               Pass: 100%/44  | Total:  6h 57m | Avg:  9m 29s | Max: 36m 35s | Hits:  99%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 20m 47s | Avg:  5m 11s | Max:  6m 14s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 18s | Avg:  6m 18s | Max:  6m 18s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 17s | Avg:  5m 17s | Max:  5m 17s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 07s | Avg:  5m 07s | Max:  5m 07s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 06s | Avg:  5m 06s | Max:  5m 06s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 58s | Avg:  5m 58s | Max:  5m 58s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 Clang18            Pass: 100%/7   | Total: 46m 00s | Avg:  6m 34s | Max: 12m 07s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 51s | Avg:  3m 55s | Max:  4m 12s
      🟩 GCC7               Pass: 100%/2   | Total:  9m 28s | Avg:  4m 44s | Max:  5m 02s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
      🟩 GCC9               Pass: 100%/3   | Total: 13m 26s | Avg:  4m 28s | Max:  5m 20s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s
      🟩 GCC11              Pass: 100%/1   | Total: 36m 35s | Avg: 36m 35s | Max: 36m 35s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 38m | Avg: 12m 15s | Max: 33m 10s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  7m 27s | Avg:  7m 27s | Max:  7m 27s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 22m 58s | Avg: 22m 58s | Max: 22m 58s | Hits:  99%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 15m 06s | Avg: 15m 06s | Max: 15m 06s | Hits:  99%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 00m | Avg: 20m 02s | Max: 23m 46s | Hits:  99%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 27m 42s | Avg: 13m 51s | Max: 13m 59s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  1h 51m | Avg:  5m 53s | Max: 12m 07s
      🟩 GCC                Pass: 100%/19  | Total:  3h 02m | Avg:  9m 35s | Max: 36m 35s
      🟩 Intel              Pass: 100%/1   | Total:  7m 27s | Avg:  7m 27s | Max:  7m 27s
      🟩 MSVC               Pass: 100%/5   | Total:  1h 38m | Avg: 19m 38s | Max: 23m 46s | Hits:  99%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total: 27m 42s | Avg: 13m 51s | Max: 13m 59s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  7h 07m | Avg:  9m 17s | Max: 36m 35s | Hits:  99%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  5h 40m | Avg:  8m 31s | Max: 36m 35s | Hits:  99%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 39m 13s | Avg: 13m 04s | Max: 23m 46s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 47m 25s | Avg: 15m 48s | Max: 22m 19s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 38s | Avg:  4m 38s | Max:  4m 38s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 21m 25s | Avg:  4m 17s | Max:  5m 28s
      🟩 14                 Pass: 100%/4   | Total: 38m 26s | Avg:  9m 36s | Max: 22m 58s | Hits:  99%/1852  
      🟩 17                 Pass: 100%/12  | Total:  2h 03m | Avg: 10m 17s | Max: 33m 10s | Hits:  99%/3704  
      🟩 20                 Pass: 100%/23  | Total:  3h 36m | Avg:  9m 23s | Max: 36m 35s | Hits:  99%/3704  
    
  • 🟩 cub: Pass: 100%/45 | Total: 7h 05m | Avg: 9m 27s | Max: 53m 13s | Hits: 99%/3028

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 56m | Avg:  9m 40s | Max: 53m 13s | Hits:  99%/3028  
      🟩 arm64              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 59s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total: 39m 40s | Avg:  5m 40s | Max: 14m 15s | Hits:  99%/757   
      🟩 12.5               Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max:  9m 25s
      🟩 12.6               Pass: 100%/36  | Total:  6h 07m | Avg: 10m 12s | Max: 53m 13s | Hits:  99%/2271  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 27s
      🟩 nvcc11.1           Pass: 100%/7   | Total: 39m 40s | Avg:  5m 40s | Max: 14m 15s | Hits:  99%/757   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max:  9m 25s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 58m | Avg: 10m 33s | Max: 53m 13s | Hits:  99%/2271  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 27s
      🟩 nvcc               Pass: 100%/43  | Total:  6h 57m | Avg:  9m 42s | Max: 53m 13s | Hits:  99%/3028  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 20m 34s | Avg:  5m 08s | Max:  6m 14s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 33s | Avg:  6m 33s | Max:  6m 33s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 17s | Avg:  5m 17s | Max:  5m 17s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 10s | Avg:  5m 10s | Max:  5m 10s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 18m | Avg: 11m 09s | Max: 34m 26s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 19s | Avg:  4m 09s | Max:  4m 11s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 12s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
      🟩 GCC9               Pass: 100%/3   | Total: 13m 43s | Avg:  4m 34s | Max:  5m 17s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 31m | Avg: 18m 56s | Max: 53m 13s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  7m 00s | Avg:  7m 00s | Max:  7m 00s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 14m 15s | Avg: 14m 15s | Max: 14m 15s | Hits:  99%/757   
      🟩 MSVC14.29          Pass: 100%/1   | Total: 11m 55s | Avg: 11m 55s | Max: 11m 55s | Hits:  99%/757   
      🟩 MSVC14.39          Pass: 100%/2   | Total: 27m 05s | Avg: 13m 32s | Max: 14m 56s | Hits:  99%/1514  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max:  9m 25s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 21m | Avg:  7m 28s | Max: 34m 26s
      🟩 GCC                Pass: 100%/19  | Total:  3h 25m | Avg: 10m 47s | Max: 53m 13s
      🟩 Intel              Pass: 100%/1   | Total:  7m 00s | Avg:  7m 00s | Max:  7m 00s
      🟩 MSVC               Pass: 100%/4   | Total: 53m 15s | Avg: 13m 18s | Max: 14m 56s | Hits:  99%/3028  
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max:  9m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/45  | Total:  7h 05m | Avg:  9m 27s | Max: 53m 13s | Hits:  99%/3028  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  4h 50m | Avg:  7m 26s | Max: 53m 13s | Hits:  99%/3028  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 17m 22s | Avg: 17m 22s | Max: 17m 22s
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 01s | Avg: 20m 01s | Max: 20m 01s
      🟩 HostLaunch         Pass: 100%/2   | Total: 41m 23s | Avg: 20m 41s | Max: 22m 28s
      🟩 TestGPU            Pass: 100%/2   | Total: 56m 51s | Avg: 28m 25s | Max: 34m 26s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 23m 26s | Avg:  4m 41s | Max:  5m 40s
      🟩 14                 Pass: 100%/4   | Total: 29m 49s | Avg:  7m 27s | Max: 14m 15s | Hits:  99%/757   
      🟩 17                 Pass: 100%/12  | Total:  2h 09m | Avg: 10m 46s | Max: 53m 13s | Hits:  99%/1514  
      🟩 20                 Pass: 100%/24  | Total:  4h 03m | Avg: 10m 08s | Max: 34m 26s | Hits:  99%/757   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 14s | Avg: 6m 37s | Max: 10m 58s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 13m 14s | Avg:  6m 37s | Max: 10m 58s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
      🟩 Test               Pass: 100%/1   | Total: 10m 58s | Avg: 10m 58s | Max: 10m 58s
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 94)

# Runner
70 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16

@elstehle elstehle merged commit a67360d into NVIDIA:main Dec 10, 2024
110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants