You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To expose algorithms like transform, fill, copy, etc, we have to implement a C version of the cub::DeviceFor::ForEachN algorithm. Overall, development can be informed by C version of the reduce API. A few key points:
We can ignore cub::DeviceFor::Bulk, cub::DeviceFor::ForEachCopy*, and cub::DeviceFor::ForEach for now and limit this issue to cub::DeviceFor::ForEachN
We can ignore cub::DeviceFor::ForEachN overload that requieres temporary storage
We can limit offset types to always be std::int64_t
We can limit the tuning policy to be always set to 256 threads per block and 2 items per thread
Following the points above, I suggest the following API:
Tests of this API can be inspired by cub::DeviceFor::ForEachNtest and C reduction test. C library should be enabled in the all-dev preset. To build the tests, it's sufficient to write: cmake --build . --target cccl.c.test.reduce in cuda12.5-gcc13 devcontainer.
The text was updated successfully, but these errors were encountered:
To expose algorithms like transform, fill, copy, etc, we have to implement a C version of the
cub::DeviceFor::ForEachN
algorithm. Overall, development can be informed by C version of the reduce API. A few key points:cub::DeviceFor::Bulk
,cub::DeviceFor::ForEachCopy*
, andcub::DeviceFor::ForEach
for now and limit this issue tocub::DeviceFor::ForEachN
cub::DeviceFor::ForEachN
overload that requieres temporary storagestd::int64_t
Following the points above, I suggest the following API:
Tests of this API can be inspired by
cub::DeviceFor::ForEachN
test and C reduction test. C library should be enabled in theall-dev
preset. To build the tests, it's sufficient to write:cmake --build . --target cccl.c.test.reduce
in cuda12.5-gcc13 devcontainer.The text was updated successfully, but these errors were encountered: