Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement C API for device for #2295

Closed
gevtushenko opened this issue Aug 26, 2024 · 1 comment · Fixed by #2378
Closed

Implement C API for device for #2295

gevtushenko opened this issue Aug 26, 2024 · 1 comment · Fixed by #2378
Assignees

Comments

@gevtushenko
Copy link
Collaborator

gevtushenko commented Aug 26, 2024

To expose algorithms like transform, fill, copy, etc, we have to implement a C version of the cub::DeviceFor::ForEachN algorithm. Overall, development can be informed by C version of the reduce API. A few key points:

  • We can ignore cub::DeviceFor::Bulk, cub::DeviceFor::ForEachCopy*, and cub::DeviceFor::ForEach for now and limit this issue to cub::DeviceFor::ForEachN
  • We can ignore cub::DeviceFor::ForEachN overload that requieres temporary storage
  • We can limit offset types to always be std::int64_t
  • We can limit the tuning policy to be always set to 256 threads per block and 2 items per thread

Following the points above, I suggest the following API:

struct cccl_device_for_build_result_t
{
  int cc;
  void* cubin;
  size_t cubin_size;
  CUlibrary library;
  CUkernel static_kernel;
};

extern "C" CCCL_C_API CUresult cccl_device_for_build(
  cccl_device_for_build_result_t* build,
  cccl_iterator_t d_data,
  cccl_op_t op,
  int cc_major,
  int cc_minor,
  const char* cub_path,
  const char* thrust_path,
  const char* libcudacxx_path,
  const char* ctk_path) noexcept;

extern "C" CCCL_C_API CUresult cccl_device_for(
  cccl_device_for_build_result_t build,
  cccl_iterator_t d_data,
  int64_t num_items,
  cccl_op_t op,
  CUstream stream) noexcept;

extern "C" CCCL_C_API CUresult cccl_device_for_cleanup(cccl_device_for_build_result_t* bld_ptr);

Tests of this API can be inspired by cub::DeviceFor::ForEachN test and C reduction test. C library should be enabled in the all-dev preset. To build the tests, it's sufficient to write: cmake --build . --target cccl.c.test.reduce in cuda12.5-gcc13 devcontainer.

@gevtushenko
Copy link
Collaborator Author

Closed by #2378

@github-project-automation github-project-automation bot moved this from In Progress to Done in CCCL Oct 9, 2024
@gevtushenko gevtushenko linked a pull request Oct 10, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants