Skip to content

Commit

Permalink
cp.async.bulk.tensor: Add .{gather,scatter}4
Browse files Browse the repository at this point in the history
  • Loading branch information
ahendriksen authored and bernhardmgruber committed Jan 30, 2025
1 parent fdc59a5 commit 05de2b6
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/libcudacxx/ptx/instructions/cp_async_bulk_tensor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,8 @@ Multicast
---------

.. include:: generated/cp_async_bulk_tensor_multicast.rst

Scatter / Gather
----------------

.. include:: generated/cp_async_bulk_tensor_gather_scatter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ _LIBCUDACXX_BEGIN_NAMESPACE_CUDA_PTX
// 9.7.8.24.9. Data Movement and Conversion Instructions: cp.async.bulk.tensor
// https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor
#include <cuda/__ptx/instructions/generated/cp_async_bulk_tensor.h>
#include <cuda/__ptx/instructions/generated/cp_async_bulk_tensor_gather_scatter.h>
#include <cuda/__ptx/instructions/generated/cp_async_bulk_tensor_multicast.h>

_LIBCUDACXX_END_NAMESPACE_CUDA_PTX
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "nvrtc_workaround.h"
// above header needs to be included before the generated test header
#include "generated/cp_async_bulk_tensor.h"
#include "generated/cp_async_bulk_tensor_gather_scatter.h"

int main(int, char**)
{
Expand Down

0 comments on commit 05de2b6

Please sign in to comment.