[BUG]: Design a fix for temporary storage alignment in cuda.cooperative module #2558
Open
1 task done
Labels
bug
Something isn't working right.
Is this a duplicate?
Type of Bug
Runtime Error
Component
Not sure
Describe the bug
cuda.cooperative API currently has an issue. We do not specify alignment of the temporary storage.
How to Reproduce
This leads to bugs like the following one:
Because both allocations of shared memory are made at
uint8
granularity, second one is not properly aligned, leading to:Expected behavior
#2527 might help avoid the issue in some cases, but we need a proper solution for temporary storage alignment in
cuda.cooperative
module.Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: