Skip to content

Commit

Permalink
PTX: add bfind, exit and trap (#3627)
Browse files Browse the repository at this point in the history
* Re-enable check for orphaned PTX docs
* Regenerate PTX docs, includes and tests
Manually excluding:
- mapa
- elect.sync
- barrier.cluster.aligned
- ldmatrix
* Mark add_ptx_instruction.py executable
* Add bfind, exit and trap
* Fix newline
  • Loading branch information
bernhardmgruber authored Jan 31, 2025
1 parent 6586186 commit 34af9a4
Show file tree
Hide file tree
Showing 22 changed files with 676 additions and 4 deletions.
9 changes: 6 additions & 3 deletions docs/libcudacxx/ptx/instructions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ PTX Instructions
:maxdepth: 1

instructions/barrier_cluster
instructions/bfind
instructions/clusterlaunchcontrol
instructions/cp_async_bulk
instructions/cp_async_bulk_commit_group
Expand All @@ -15,6 +16,7 @@ PTX Instructions
instructions/cp_async_mbarrier_arrive
instructions/cp_reduce_async_bulk
instructions/cp_reduce_async_bulk_tensor
instructions/exit
instructions/fence
instructions/getctarank
instructions/mapa
Expand All @@ -41,6 +43,7 @@ PTX Instructions
instructions/tcgen05_wait
instructions/tensormap_replace
instructions/tensormap_cp_fenceproxy
instructions/trap
instructions/special_registers


Expand Down Expand Up @@ -72,7 +75,7 @@ Instructions by section
* - `clz <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-clz>`__
- No
* - `bfind <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind>`__
- No
- CCCL 3.0.0
* - `fns <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-fns>`__
- No
* - `brev <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-brev>`__
Expand Down Expand Up @@ -365,7 +368,7 @@ Instructions by section
* - `ret <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-ret>`__
- No
* - `exit <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-exit>`__
- No
- CCCL 3.0.0

.. list-table:: `Parallel Synchronization and Communication Instructions <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions>`__
:widths: 50 50
Expand Down Expand Up @@ -529,7 +532,7 @@ Instructions by section
* - `pmevent <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-pmevent>`__
- No
* - `trap <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-trap>`__
- No
- CCCL 3.0.0
* - `setmaxnreg <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg>`__
- No

Expand Down
9 changes: 9 additions & 0 deletions docs/libcudacxx/ptx/instructions/bfind.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _libcudacxx-ptx-instructions-bfind:

bfind
=====

- PTX ISA:
`bfind <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind>`__

.. include:: generated/bfind.rst
9 changes: 9 additions & 0 deletions docs/libcudacxx/ptx/instructions/exit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _libcudacxx-ptx-instructions-exit:

exit
====

- PTX ISA:
`exit <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-exit>`__

.. include:: generated/exit.rst
74 changes: 74 additions & 0 deletions docs/libcudacxx/ptx/instructions/generated/bfind.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
..
This file was automatically generated. Do not edit.
bfind.u32
^^^^^^^^^
.. code:: cuda
// bfind.u32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
uint32_t a_reg);
bfind.shiftamt.u32
^^^^^^^^^^^^^^^^^^
.. code:: cuda
// bfind.shiftamt.u32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
uint32_t a_reg);
bfind.u64
^^^^^^^^^
.. code:: cuda
// bfind.u64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
uint64_t a_reg);
bfind.shiftamt.u64
^^^^^^^^^^^^^^^^^^
.. code:: cuda
// bfind.shiftamt.u64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
uint64_t a_reg);
bfind.s32
^^^^^^^^^
.. code:: cuda
// bfind.s32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
int32_t a_reg);
bfind.shiftamt.s32
^^^^^^^^^^^^^^^^^^
.. code:: cuda
// bfind.shiftamt.s32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
int32_t a_reg);
bfind.s64
^^^^^^^^^
.. code:: cuda
// bfind.s64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
int64_t a_reg);
bfind.shiftamt.s64
^^^^^^^^^^^^^^^^^^
.. code:: cuda
// bfind.shiftamt.s64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
int64_t a_reg);
10 changes: 10 additions & 0 deletions docs/libcudacxx/ptx/instructions/generated/exit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
..
This file was automatically generated. Do not edit.
exit
^^^^
.. code:: cuda
// exit; // PTX ISA 10, SM_50
template <typename = void>
__device__ static inline void exit();
10 changes: 10 additions & 0 deletions docs/libcudacxx/ptx/instructions/generated/trap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
..
This file was automatically generated. Do not edit.
trap
^^^^
.. code:: cuda
// trap; // PTX ISA 10, SM_50
template <typename = void>
__device__ static inline void trap();
9 changes: 9 additions & 0 deletions docs/libcudacxx/ptx/instructions/trap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _libcudacxx-ptx-instructions-trap:

trap
====

- PTX ISA:
`trap <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-trap>`__

.. include:: generated/trap.rst
2 changes: 1 addition & 1 deletion docs/repo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ api_output_directory = "api"
use_fast_doxygen_conversion = true
sphinx_generate_doxygen_groups = true
sphinx_generate_doxygen_pages = true
sphinx_exclude_patterns = ['ptx/instructions/generated']
sphinx_exclude_patterns = []

[repo_docs.projects.cub]
name = "CUB"
Expand Down
Empty file modified libcudacxx/codegen/add_ptx_instruction.py
100644 → 100755
Empty file.
37 changes: 37 additions & 0 deletions libcudacxx/include/cuda/__ptx/instructions/bfind.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// -*- C++ -*-
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//

#ifndef _CUDA_PTX_BFIND_H_
#define _CUDA_PTX_BFIND_H_

#include <cuda/std/detail/__config>

#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC)
# pragma GCC system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG)
# pragma clang system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC)
# pragma system_header
#endif // no system header

#include <cuda/__ptx/ptx_dot_variants.h>
#include <cuda/__ptx/ptx_helper_functions.h>
#include <cuda/std/cstdint>

#include <nv/target> // __CUDA_MINIMUM_ARCH__ and friends

_LIBCUDACXX_BEGIN_NAMESPACE_CUDA_PTX

#include <cuda/__ptx/instructions/generated/bfind.h>

_LIBCUDACXX_END_NAMESPACE_CUDA_PTX

#endif // _CUDA_PTX_BFIND_H_
37 changes: 37 additions & 0 deletions libcudacxx/include/cuda/__ptx/instructions/exit.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// -*- C++ -*-
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//

#ifndef _CUDA_PTX_EXIT_H_
#define _CUDA_PTX_EXIT_H_

#include <cuda/std/detail/__config>

#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC)
# pragma GCC system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG)
# pragma clang system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC)
# pragma system_header
#endif // no system header

#include <cuda/__ptx/ptx_dot_variants.h>
#include <cuda/__ptx/ptx_helper_functions.h>
#include <cuda/std/cstdint>

#include <nv/target> // __CUDA_MINIMUM_ARCH__ and friends

_LIBCUDACXX_BEGIN_NAMESPACE_CUDA_PTX

#include <cuda/__ptx/instructions/generated/exit.h>

_LIBCUDACXX_END_NAMESPACE_CUDA_PTX

#endif // _CUDA_PTX_EXIT_H_
Loading

0 comments on commit 34af9a4

Please sign in to comment.