Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: thrust::inclusive_scan not working correctly when -rdc=true #2449

Open
1 task done
Olli1080 opened this issue Sep 21, 2024 · 1 comment
Open
1 task done

[BUG]: thrust::inclusive_scan not working correctly when -rdc=true #2449

Olli1080 opened this issue Sep 21, 2024 · 1 comment
Labels
bug Something isn't working right.

Comments

@Olli1080
Copy link

Olli1080 commented Sep 21, 2024

Is this a duplicate?

Type of Bug

Silent Failure

Component

Thrust

Describe the bug

thrust::inclusive_scan does not work properly on certain vector sizes and seems to stop midway e.g. [1,2,3,x,x,...,x] (x being the value the result vector was initialized with)
After some experiments it seems that -rdc=true causes this behaviour. (changing from debug to release results in the exact same results)

Visual Studio Community 2022
-Version 17.11.4

CUDA 12.6 Update 1

How to Reproduce

  1. Download reproduction repo at https://github.com/Olli1080/Cuda-Bug.git
  2. Compile the solution
  3. It outputs the size ranges where inclusive_scan fails
  4. Disable GenerateRelocatableDeviceCode (aka. -rdc=true)
  5. No output is generated indicating that everything works

Log generated on my system:

at iteration: 1537; 0 != 1537
[1537 - 1921]
at iteration: 3073; 0 != 3073
[3073 - 3841]
at iteration: 4609; 0 != 4609
[4609 - 5761]
at iteration: 6145; 0 != 6145
[6145 -

Expected behavior

inclusive_scan should work no matter the status of GenerateRelocatableDeviceCode

Reproduction link

https://github.com/Olli1080/Cuda-Bug.git

Operating System

Windows 11 Pro 10.0.22631

nvidia-smi output

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.94 Driver Version: 560.94 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 WDDM | 00000000:01:00.0 On | Off |
| 0% 38C P8 22W / 450W | 1460MiB / 24564MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:26:51_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

@Olli1080 Olli1080 added the bug Something isn't working right. label Sep 21, 2024
@github-project-automation github-project-automation bot moved this to Todo in CCCL Sep 21, 2024
@Olli1080
Copy link
Author

After some testing i've got the same results for 12.5 Patch 1 and 12.3 Patch 2 (didn't check 12.4).
The same behaviour could also be triggered without -rdc=true but instead -ewp.

The issue does not appear if i set SM>=60, compute>=60, which if left empty defaults to SM_52, compute_52.
The issue is also present for SM_53, compute_53.
I've tested all configurations of 52<=SM<=89, 52<=compute<=89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

1 participant