Skip to content

Fix thrust::uniform_int_distribution last 12-bits always being 0 #4393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gonidelis
Copy link
Member

@gonidelis gonidelis commented Apr 9, 2025

fixes #758

Due to thrust::uniform_int_distribution relying on uniform_real_distribution the last 12-bits, as suggested in the tracking issue, are always 0. When I run the code from #758 (but with a full default_random_engine) I get output that looks like:

...
174c87445d321d00
37171a6adc5c6a00
bd4f637af53d9000
c674004319d0000
c0b13d9f02c4f800
cc428f1d310a4000
fa47ad89e91eb800
717b3fa7c5ed0000
e69896039a625800
dd6696cf759a5800
30ecb360c3b2ce0
9280a8c84a02a000
51d1ff8747480000
...

I am still not entirely sure why 3rd to last (even the 2nd in some rare cases) digit sometimes is not 0.

If a full engine is used the trailing zeros will (or at least should) be 3 and this PR fixes that by bit-manipulation. After calculating the base as before, we produce a new 12-bit uniformly random number that we add/append to the base. The result is a new uniformly random number.

Note

User needs to take care to use a proper engine. The code from the issue is using a 48-bit engine. That's why there are 4 trailing 0s, and not 3 as expected when I run the source from the issue with ranlux48. Detailed explanation in the comment under the issue.

...
4d283257ce8e0000
c9a90e6b5ff70000
44b01f0ccee50000
9723c9f4702a0000
603b1435411f0000
511e574a66b70000
ad200f0af9a10000
e2ea32f7f0010000
c04d33ea804e0000
9363937ffa6b0000
ec1867ba4e5c0000
86751aecd9780000
c37205ce33b70000
9ead0d2b22500000
...

@gonidelis gonidelis added the bug Something isn't working right. label Apr 9, 2025
@gonidelis gonidelis requested a review from a team as a code owner April 9, 2025 22:50
@gonidelis gonidelis requested a review from elstehle April 9, 2025 22:50
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 9, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 9, 2025
@gonidelis gonidelis requested review from alliepiper and miscco April 9, 2025 22:50
@gonidelis gonidelis force-pushed the uniform_int_distr branch from 2a523e3 to 9768035 Compare April 9, 2025 22:53
@djns99
Copy link
Contributor

djns99 commented Apr 9, 2025

@gonidelis Thanks for this!

I am still not entirely sure why 3rd to last (even the 2nd in some rare cases) digit sometimes is not 0.

This will be the case where it produces a small value that can be represented accurately in the mantissa, notice there is an extra leading zero (omitted by the print) for that case


// Add random lower bits to fix last 12-bits always being 0.
result_type lower_bits = static_cast<result_type>(urng()) & 0xFFF; // Get 12 **uniformly** random bits
return base | lower_bits; // Combine them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason we use the real_dist, instead of using an algorithm that can directly generate integers.

Also won't this break for int distributions < 12 bits

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason we use the real_dist, instead of using an algorithm that can directly generate integers.

It's legacy, unfortunately I don't know.

Also won't this break for int distributions < 12 bits

yes, good catch. fixing it rn.

Copy link
Contributor

🟨 CI finished in 4h 52m: Pass: 88%/101 | Total: 2d 00h | Avg: 29m 00s | Max: 1h 30m | Hits: 78%/121761
  • 🟨 thrust: Pass: 85%/47 | Total: 1d 02h | Avg: 33m 42s | Max: 1h 30m | Hits: 65%/71033

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  84%/45  | Total:  1d 01h | Avg: 33m 51s | Max:  1h 30m | Hits:  65%/67480 
      🟩 arm64              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 07s | Max: 31m 30s | Hits:  63%/3553  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  3h 12m | Avg: 38m 28s | Max: 59m 14s | Hits:  71%/8876  
      🔍 12.8               Pass:  83%/42  | Total: 23h 11m | Avg: 33m 08s | Max:  1h 30m | Hits:  64%/62157 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 57m 28s | Avg: 28m 44s | Max: 31m 09s | Hits:  63%/3552  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 12m | Avg: 38m 28s | Max: 59m 14s | Hits:  71%/8876  
      🔍 nvcc12.8           Pass:  82%/40  | Total: 22h 14m | Avg: 33m 21s | Max:  1h 30m | Hits:  64%/58605 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 57m 28s | Avg: 28m 44s | Max: 31m 09s | Hits:  63%/3552  
      🔍 nvcc               Pass:  84%/45  | Total:  1d 01h | Avg: 33m 55s | Max:  1h 30m | Hits:  65%/67481 
    🔍 sm: 90 🔍
      🔍 90                 Pass:  50%/2   | Total: 31m 14s | Avg: 15m 37s | Max: 19m 39s | Hits:  63%/1777  
      🟩 90;90a;100         Pass: 100%/1   | Total: 29m 35s | Avg: 29m 35s | Max: 29m 35s | Hits:  83%/1777  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/21  | Total: 13h 48m | Avg: 39m 28s | Max:  1h 22m | Hits:  65%/37287 
      🔍 20                 Pass:  75%/24  | Total: 11h 56m | Avg: 29m 50s | Max:  1h 30m | Hits:  66%/31969 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 37s | Max: 34m 33s | Hits:  68%/7104  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 12s | Max: 32m 02s | Hits:  63%/3552  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 44s | Max: 34m 13s | Hits:  63%/3552  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 06s | Max: 33m 35s | Hits:  63%/3552  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 49s | Max: 34m 51s | Hits:  63%/3552  
      🟨 Clang19            Pass:  71%/7   | Total:  2h 49m | Avg: 24m 09s | Max: 33m 04s | Hits:  63%/8880  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 45s | Max: 31m 57s | Hits:  69%/3554  
      🟩 GCC8               Pass: 100%/1   | Total: 30m 18s | Avg: 30m 18s | Max: 30m 18s | Hits:  63%/1777  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 11m | Avg: 35m 42s | Max: 35m 50s | Hits:  69%/3554  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 27s | Hits:  63%/3554  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 29s | Max: 32m 41s | Hits:  63%/3554  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 15s | Max: 36m 15s | Hits:  63%/3554  
      🟨 GCC13              Pass:  60%/10  | Total:  3h 34m | Avg: 21m 27s | Max: 34m 14s | Hits:  70%/10662 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 54s | Max: 59m 14s | Hits:  66%/3540  
      🟨 MSVC14.42          Pass:  66%/3   | Total:  2h 34m | Avg: 51m 23s | Max:  1h 03m | Hits:  61%/3540  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  2h 53m | Avg:  1h 26m | Max:  1h 30m | Hits:  61%/3552  
    🟨 cxx_family
      🟨 Clang              Pass:  89%/19  | Total:  9h 19m | Avg: 29m 26s | Max: 34m 51s | Hits:  64%/30192 
      🟨 GCC                Pass:  80%/21  | Total:  9h 39m | Avg: 27m 35s | Max: 36m 15s | Hits:  67%/30209 
      🟨 MSVC               Pass:  80%/5   | Total:  4h 32m | Avg: 54m 24s | Max:  1h 03m | Hits:  64%/7080  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 53m | Avg:  1h 26m | Max:  1h 30m | Hits:  61%/3552  
    🟨 gpu
      🟨 h100               Pass:  50%/2   | Total: 31m 14s | Avg: 15m 37s | Max: 19m 39s | Hits:  63%/1777  
      🟩 rtx2080            Pass: 100%/35  | Total: 22h 04m | Avg: 37m 50s | Max:  1h 30m | Hits:  65%/62156 
      🟨 rtx4090            Pass:  40%/10  | Total:  3h 48m | Avg: 22m 49s | Max:  1h 02m | Hits:  68%/7100  
    🟨 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 00h | Avg: 37m 23s | Max:  1h 30m | Hits:  65%/71033 
      🟥 TestCPU            Pass:   0%/3   | Total: 43m 55s | Avg: 14m 38s | Max: 27m 57s
      🟥 TestGPU            Pass:   0%/4   | Total: 44m 38s | Avg: 11m 09s | Max: 11m 35s
    🟨 cmake_options
      🟨 -DTHRUST_DISPATCH_TYPE=Force32bit Pass:  50%/2   | Total: 38m 56s | Avg: 19m 28s | Max: 27m 40s | Hits:  63%/1777  
    
  • 🟨 cub: Pass: 89%/47 | Total: 20h 06m | Avg: 25m 40s | Max: 39m 05s | Hits: 96%/50400

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/45  | Total: 19h 10m | Avg: 25m 33s | Max: 39m 05s | Hits:  96%/47942 
      🟩 arm64              Pass: 100%/2   | Total: 55m 56s | Avg: 27m 58s | Max: 30m 26s | Hits:  95%/2458  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  2h 24m | Avg: 28m 54s | Max: 38m 05s | Hits:  95%/5974  
      🔍 12.8               Pass:  88%/42  | Total: 17h 41m | Avg: 25m 16s | Max: 39m 05s | Hits:  96%/44426 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 31m 06s | Avg: 15m 33s | Max: 15m 37s | Hits:  96%/2120  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 24m | Avg: 28m 54s | Max: 38m 05s | Hits:  95%/5974  
      🔍 nvcc12.8           Pass:  87%/40  | Total: 17h 10m | Avg: 25m 46s | Max: 39m 05s | Hits:  95%/42306 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 31m 06s | Avg: 15m 33s | Max: 15m 37s | Hits:  96%/2120  
      🔍 nvcc               Pass:  88%/45  | Total: 19h 35m | Avg: 26m 07s | Max: 39m 05s | Hits:  95%/48280 
    🔍 sm: 90 🔍
      🔍 90                 Pass:  66%/3   | Total:  1h 00m | Avg: 20m 11s | Max: 26m 28s | Hits:  97%/2458  
      🟩 90;90a;100         Pass: 100%/1   | Total: 23m 10s | Avg: 23m 10s | Max: 23m 10s | Hits:  95%/1229  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/21  | Total:  9h 21m | Avg: 26m 44s | Max: 38m 05s | Hits:  95%/25026 
      🔍 20                 Pass:  80%/26  | Total: 10h 44m | Avg: 24m 47s | Max: 39m 05s | Hits:  96%/25374 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 39m | Avg: 24m 58s | Max: 27m 16s | Hits:  95%/4924  
      🟩 Clang15            Pass: 100%/2   | Total: 48m 13s | Avg: 24m 06s | Max: 24m 09s | Hits:  95%/2458  
      🟩 Clang16            Pass: 100%/2   | Total: 50m 15s | Avg: 25m 07s | Max: 26m 49s | Hits:  95%/2458  
      🟩 Clang17            Pass: 100%/2   | Total: 49m 05s | Avg: 24m 32s | Max: 25m 11s | Hits:  95%/2458  
      🟩 Clang18            Pass: 100%/2   | Total: 46m 50s | Avg: 23m 25s | Max: 23m 36s | Hits:  95%/2458  
      🟨 Clang19            Pass:  85%/7   | Total:  2h 32m | Avg: 21m 50s | Max: 25m 30s | Hits:  96%/7036  
      🟩 GCC7               Pass: 100%/2   | Total: 50m 02s | Avg: 25m 01s | Max: 25m 45s | Hits:  95%/2462  
      🟩 GCC8               Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s | Hits:  95%/1231  
      🟩 GCC9               Pass: 100%/2   | Total: 53m 43s | Avg: 26m 51s | Max: 28m 38s | Hits:  95%/2462  
      🟩 GCC10              Pass: 100%/2   | Total: 48m 24s | Avg: 24m 12s | Max: 24m 55s | Hits:  95%/2462  
      🟩 GCC11              Pass: 100%/2   | Total: 51m 02s | Avg: 25m 31s | Max: 27m 19s | Hits:  95%/2458  
      🟩 GCC12              Pass: 100%/2   | Total: 49m 18s | Avg: 24m 39s | Max: 24m 48s | Hits:  95%/2458  
      🟨 GCC13              Pass:  63%/11  | Total:  4h 22m | Avg: 23m 54s | Max: 30m 26s | Hits:  96%/8603  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 15m | Avg: 37m 41s | Max: 38m 05s | Hits:  96%/2100  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 16m | Avg: 38m 29s | Max: 39m 05s | Hits:  96%/2100  
      🟩 NVHPC25.3          Pass: 100%/2   | Total:  1h 07m | Avg: 33m 49s | Max: 34m 05s | Hits:  94%/2272  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/19  | Total:  7h 27m | Avg: 23m 32s | Max: 27m 16s | Hits:  96%/21792 
      🟨 GCC                Pass:  81%/22  | Total:  8h 59m | Avg: 24m 30s | Max: 30m 26s | Hits:  95%/22136 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 32m | Avg: 38m 05s | Max: 39m 05s | Hits:  96%/4200  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 49s | Max: 34m 05s | Hits:  94%/2272  
    🟨 gpu
      🟨 h100               Pass:  66%/3   | Total:  1h 00m | Avg: 20m 11s | Max: 26m 28s | Hits:  97%/2458  
      🟩 rtx2080            Pass: 100%/36  | Total: 15h 48m | Avg: 26m 20s | Max: 39m 05s | Hits:  95%/43026 
      🟨 rtxa6000           Pass:  50%/8   | Total:  3h 17m | Avg: 24m 43s | Max: 29m 27s | Hits:  97%/4916  
    🟨 jobs
      🟩 Build              Pass: 100%/39  | Total: 16h 48m | Avg: 25m 51s | Max: 39m 05s | Hits:  95%/46713 
      🟥 DeviceLaunch       Pass:   0%/1   | Total: 26m 49s | Avg: 26m 49s | Max: 26m 49s
      🟥 GraphCapture       Pass:   0%/1   | Total: 19m 49s | Avg: 19m 49s | Max: 19m 49s
      🟥 HostLaunch         Pass:   0%/3   | Total:  1h 21m | Avg: 27m 08s | Max: 29m 27s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 09m | Avg: 23m 18s | Max: 23m 50s | Hits:  99%/3687  
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 22m 38s | Avg: 5m 39s | Max: 7m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max:  7m 36s
      🟩 arm64              Pass: 100%/2   | Total:  9m 40s | Avg:  4m 50s | Max:  5m 26s
    🟩 ctk
      🟩 12.8               Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 cxx
      🟩 NVHPC25.3          Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 22m 38s | Avg:  5m 39s | Max:  7m 36s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  7m 36s
      🟩 20                 Pass: 100%/2   | Total:  9m 36s | Avg:  4m 48s | Max:  5m 22s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits: 98%/328

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 27m 25s | Avg: 13m 42s | Max: 25m 10s | Hits:  98%/328   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s | Hits:  98%/164   
      🟩 Test               Pass: 100%/1   | Total: 25m 10s | Avg: 25m 10s | Max: 25m 10s | Hits:  98%/164   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 29m | Avg: 1h 29m | Max: 1h 29m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 29m | Avg:  1h 29m | Max:  1h 29m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 101)

# Runner
72 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@gonidelis gonidelis marked this pull request as draft April 10, 2025 16:54
Copy link

copy-pr-bot bot commented Apr 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Apr 10, 2025
@gonidelis
Copy link
Member Author

#4410 should fix the bugs mentioned above. The existing PR will alias the work in #4410 for thrust to expose a proper solution to the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

thrust::uniform_int_distribution<uint64_t> exclusively produces multiples of 4096
2 participants