Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Design a fix for temporary storage alignment in cuda.cooperative module #2558

Open
1 task done
gevtushenko opened this issue Oct 10, 2024 · 0 comments
Open
1 task done
Labels
bug Something isn't working right.

Comments

@gevtushenko
Copy link
Collaborator

Is this a duplicate?

Type of Bug

Runtime Error

Component

Not sure

Describe the bug

cuda.cooperative API currently has an issue. We do not specify alignment of the temporary storage.

How to Reproduce

This leads to bugs like the following one:

cx_int_sum = cudax.block.sum(numba.int32, 256)
int_sum_storage_bytes = cx_int_sum.temp_storage_bytes
cx_float_sum = cudax.block.sum(dtype=numba.float64, threads_in_block=256)
float_sum_storage_bytes = cx_float_sum.temp_storage_bytes

cx_files = cx_float_sum.files + cx_int_sum.files


@numba.cuda.jit(
    "void(int32[:], float64[:])",
    link=cx_files
)
def kernel(args_in, args_out):
    int_sum_storage = numba.cuda.shared.array(shape=int_sum_storage_bytes, dtype=numba.uint8)
    float_sum_storage = numba.cuda.shared.array(shape=float_sum_storage_bytes, dtype=numba.uint8)
    # do int reduction first
    tix = numba.cuda.threadIdx.x
    val = cx_int_sum(int_sum_storage, args_in[tix])
    val2 = cx_float_sum(float_sum_storage, numba.float64(args_in[tix]))
    args_out[tix] = val + val2

Because both allocations of shared memory are made at uint8 granularity, second one is not properly aligned, leading to:

cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorMisalignedAddress: misaligned address

Expected behavior

#2527 might help avoid the issue in some cases, but we need a proper solution for temporary storage alignment in cuda.cooperative module.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@gevtushenko gevtushenko added the bug Something isn't working right. label Oct 10, 2024
@github-project-automation github-project-automation bot moved this to Todo in CCCL Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

1 participant