-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Potential race in merge sort due to PDL #3131
Comments
So we can overlap BlockSort with a previous kernel (outside CUB). However, I just saw that I missed launching the kernel with the PDL flag.
I tried adding one, but I always ended up crashing. Under compute-sanitizer, the bug disappeared. I discussed with @ahendriksen and it seems the workaround was to
However, the entire kernel is divergent until it exits (because it branches based on whether the thread id is smaller then the problem size), so we could only trigger the next launch at the end of the kernel, which is also done implicitly. Therefore, such a call is missing.
I think you may have fallen for the same misconception as I did, but @ahendriksen could help me out here:
|
The only data race I could introduce is if I would put the |
My bad, I completely missed that. Thank you for elaborating! |
Is this a duplicate?
Type of Bug
Silent Failure
Component
CUB
Describe the bug
#3114 introduced programmatic dependent launch into device merge sort. I think it should cause date races. Dependent launch consists of two steps:
cudaTriggerProgrammaticLaunchCompletion()
,cudaGridDependencySynchronize()
This causes concurrent execution of these kernels:
As written, merge sort now has the following structure:
Since each pair of primary / secondary kernels is concurrent, we should have a data race between:
We likely want to trigger dependent launch after we write the data, not before.
How to Reproduce
Likely have to find workload that is under occupancy so that last CTA of primary kernel runs concurrently with last CTA of secondary kernel.
Expected behavior
No race in merge sort
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: