-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cuda.cooperative] Add block.load and block.store. #2693
[cuda.cooperative] Add block.load and block.store. #2693
Conversation
|
||
def dtype(self): | ||
return numba.types.Array(self.value_dtype, 1, 'C') | ||
return numba.types.Array(self.value_dtype, 1, 'A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change because I was getting failures in array_to_array
in Numba, which gets called when there's an implicit cast between array types. It's unhappy if you're casting to an array type that doesn't have the 'A'
layout. The 'A'
layout means the array could be in any layout, so this makes sense; you can cast a 'C'
(row-major) or 'F'
(column-major) array to 'A'
, but not vice versa. It's akin to being able to cast anything to void*
implicitly, but not explicitly.
It's also notable that I once ended up at this assertion when I had a float64
array that was trying to be converted to a float32
array. It seems like that should have been caught and reported earlier in Numba.
Perhaps Graham Markall can provide some insight.
cc @rwgk @emcastillo for vis |
…/block_load_store
/ok to test |
/ok to test |
/ok to test |
🟩 CI finished in 24m 17s: Pass: 100%/1 | Total: 24m 17s | Avg: 24m 17s | Max: 24m 17s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 1)
# | Runner |
---|---|
1 | linux-amd64-gpu-v100-latest-1 |
* [cuda.cooperative] Add block.load and block.store. --------- Co-authored-by: Georgy Evtushenko <[email protected]>
Description
This PR adds
cub::BlockLoad
andcub::BlockStore
tocuda.cooperative
asblock.load
andblock.store
respectively.Both of these algorithms take the input/output as a C++ iterator. To support them in
cuda.cooperative
, I added aDependentPointer
templating facility based on the existingPointer
andDependentReference
. This seems to work fine, however I ran into a weird issue deep in Numba where an array-to-array conversion failed because the type of array produced byPointer
had a Numba layout of'C'
instead of'A'
(which means any layout). I changed the type of array produced byPointer
to'A'
which seems to have done the trick. It's also notable that I once ended up at this assertion when I had afloat64
array that was trying to be converted to afloat32
array. It seems like that should have been caught and reported earlier in Numba.I exposed the CUB load/store algorithm parameters, e.g.
cub::BLOCK_LOAD_TRANSPOSE
. I decided to have these parameters passed as strings in Python, instead of a Python enum or named objects. This more closely matches the parameter-passing style of NumPy (layout='C', dtype='float32'
), is less verbose, and doesn't preclude us adding enums or named objects later. I chose to not gives the names prefixes and give the same names for load and store, e.g.'transpose'
.Currently I'm having an issue with
block.store
in my softmax example but I suspect it's just a bug in my code.Checklist
Pointer
array layout change from'A'
to'C'
.block.store
in softmax example.