Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: add bfind, exit and trap #3627

Merged
merged 6 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions docs/libcudacxx/ptx/instructions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ PTX Instructions
:maxdepth: 1

instructions/barrier_cluster
instructions/bfind
instructions/clusterlaunchcontrol
instructions/cp_async_bulk
instructions/cp_async_bulk_commit_group
Expand All @@ -15,6 +16,7 @@ PTX Instructions
instructions/cp_async_mbarrier_arrive
instructions/cp_reduce_async_bulk
instructions/cp_reduce_async_bulk_tensor
instructions/exit
instructions/fence
instructions/getctarank
instructions/mapa
Expand All @@ -41,6 +43,7 @@ PTX Instructions
instructions/tcgen05_wait
instructions/tensormap_replace
instructions/tensormap_cp_fenceproxy
instructions/trap
instructions/special_registers


Expand Down Expand Up @@ -72,7 +75,7 @@ Instructions by section
* - `clz <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-clz>`__
- No
* - `bfind <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind>`__
- No
- CCCL 3.0.0
* - `fns <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-fns>`__
- No
* - `brev <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-brev>`__
Expand Down Expand Up @@ -365,7 +368,7 @@ Instructions by section
* - `ret <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-ret>`__
- No
* - `exit <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-exit>`__
- No
- CCCL 3.0.0

.. list-table:: `Parallel Synchronization and Communication Instructions <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions>`__
:widths: 50 50
Expand Down Expand Up @@ -529,7 +532,7 @@ Instructions by section
* - `pmevent <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-pmevent>`__
- No
* - `trap <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-trap>`__
- No
- CCCL 3.0.0
* - `setmaxnreg <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg>`__
- No

Expand Down
9 changes: 9 additions & 0 deletions docs/libcudacxx/ptx/instructions/bfind.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _libcudacxx-ptx-instructions-bfind:

bfind
=====

- PTX ISA:
`bfind <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind>`__

.. include:: generated/bfind.rst
9 changes: 9 additions & 0 deletions docs/libcudacxx/ptx/instructions/exit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _libcudacxx-ptx-instructions-exit:

exit
====

- PTX ISA:
`exit <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-exit>`__

.. include:: generated/exit.rst
74 changes: 74 additions & 0 deletions docs/libcudacxx/ptx/instructions/generated/bfind.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
..
This file was automatically generated. Do not edit.

bfind.u32
^^^^^^^^^
.. code:: cuda

// bfind.u32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
uint32_t a_reg);

bfind.shiftamt.u32
^^^^^^^^^^^^^^^^^^
.. code:: cuda

// bfind.shiftamt.u32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
uint32_t a_reg);

bfind.u64
^^^^^^^^^
.. code:: cuda

// bfind.u64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
uint64_t a_reg);

bfind.shiftamt.u64
^^^^^^^^^^^^^^^^^^
.. code:: cuda

// bfind.shiftamt.u64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
uint64_t a_reg);

bfind.s32
^^^^^^^^^
.. code:: cuda

// bfind.s32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
int32_t a_reg);

bfind.shiftamt.s32
^^^^^^^^^^^^^^^^^^
.. code:: cuda

// bfind.shiftamt.s32 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
int32_t a_reg);

bfind.s64
^^^^^^^^^
.. code:: cuda

// bfind.s64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind(
int64_t a_reg);

bfind.shiftamt.s64
^^^^^^^^^^^^^^^^^^
.. code:: cuda

// bfind.shiftamt.s64 dest, a_reg; // PTX ISA 20, SM_50
template <typename = void>
__device__ static inline uint32_t bfind_shiftamt(
int64_t a_reg);
10 changes: 10 additions & 0 deletions docs/libcudacxx/ptx/instructions/generated/exit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
..
This file was automatically generated. Do not edit.

exit
^^^^
.. code:: cuda

// exit; // PTX ISA 10, SM_50
template <typename = void>
__device__ static inline void exit();
10 changes: 10 additions & 0 deletions docs/libcudacxx/ptx/instructions/generated/trap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
..
This file was automatically generated. Do not edit.

trap
^^^^
.. code:: cuda

// trap; // PTX ISA 10, SM_50
template <typename = void>
__device__ static inline void trap();
9 changes: 9 additions & 0 deletions docs/libcudacxx/ptx/instructions/trap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _libcudacxx-ptx-instructions-trap:

trap
====

- PTX ISA:
`trap <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-trap>`__

.. include:: generated/trap.rst
2 changes: 1 addition & 1 deletion docs/repo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ api_output_directory = "api"
use_fast_doxygen_conversion = true
sphinx_generate_doxygen_groups = true
sphinx_generate_doxygen_pages = true
sphinx_exclude_patterns = ['ptx/instructions/generated']
sphinx_exclude_patterns = []

[repo_docs.projects.cub]
name = "CUB"
Expand Down
Empty file modified libcudacxx/codegen/add_ptx_instruction.py
100644 → 100755
Empty file.
37 changes: 37 additions & 0 deletions libcudacxx/include/cuda/__ptx/instructions/bfind.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// -*- C++ -*-
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//

#ifndef _CUDA_PTX_BFIND_H_
#define _CUDA_PTX_BFIND_H_

#include <cuda/std/detail/__config>

#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC)
# pragma GCC system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG)
# pragma clang system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC)
# pragma system_header
#endif // no system header

#include <cuda/__ptx/ptx_dot_variants.h>
#include <cuda/__ptx/ptx_helper_functions.h>
#include <cuda/std/cstdint>

#include <nv/target> // __CUDA_MINIMUM_ARCH__ and friends

_LIBCUDACXX_BEGIN_NAMESPACE_CUDA_PTX

#include <cuda/__ptx/instructions/generated/bfind.h>

_LIBCUDACXX_END_NAMESPACE_CUDA_PTX

#endif // _CUDA_PTX_BFIND_H_
37 changes: 37 additions & 0 deletions libcudacxx/include/cuda/__ptx/instructions/exit.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// -*- C++ -*-
//===----------------------------------------------------------------------===//
//
// Part of libcu++, the C++ Standard Library for your entire system,
// under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//

#ifndef _CUDA_PTX_EXIT_H_
#define _CUDA_PTX_EXIT_H_

#include <cuda/std/detail/__config>

#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC)
# pragma GCC system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG)
# pragma clang system_header
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC)
# pragma system_header
#endif // no system header

#include <cuda/__ptx/ptx_dot_variants.h>
#include <cuda/__ptx/ptx_helper_functions.h>
#include <cuda/std/cstdint>

#include <nv/target> // __CUDA_MINIMUM_ARCH__ and friends

_LIBCUDACXX_BEGIN_NAMESPACE_CUDA_PTX

#include <cuda/__ptx/instructions/generated/exit.h>

_LIBCUDACXX_END_NAMESPACE_CUDA_PTX

#endif // _CUDA_PTX_EXIT_H_
Loading
Loading