This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 186
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for atomic_ref (#257)
* Add documentation for atomic_ref * Add limitations to atomic_ref section * Update nav_order for synchronization primitives * Fixup and describe additional limitations in atomic_ref * Update docs/extended_api/synchronization_primitives/atomic_ref.md Co-authored-by: gonzalobg <[email protected]> * Update docs/extended_api/synchronization_primitives/atomic_ref.md Co-authored-by: Jake Hemstad <[email protected]> * Update verbiage covering concurrent access with different scoped atomics. Co-authored-by: gonzalobg <[email protected]> * Clarify size restrictions on `atomic_ref`. * Apply suggestions from code review * Final touches. Co-authored-by: gonzalobg <[email protected]> Co-authored-by: Jake Hemstad <[email protected]>
- Loading branch information
1 parent
e2d7c47
commit 3fdea17
Showing
6 changed files
with
106 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
99 changes: 99 additions & 0 deletions
99
docs/extended_api/synchronization_primitives/atomic_ref.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
--- | ||
grand_parent: Extended API | ||
parent: Synchronization Primitives | ||
nav_order: 1 | ||
--- | ||
|
||
# `cuda::atomic_ref` | ||
|
||
Defined in header `<cuda/atomic>`: | ||
|
||
```cuda | ||
template <typename T, cuda::thread_scope Scope = cuda::thread_scope_system> | ||
class cuda::atomic_ref; | ||
``` | ||
|
||
The class template `cuda::atomic_ref` is an extended form of [`cuda::std::atomic_ref`] | ||
that takes an additional [`cuda::thread_scope`] argument, defaulted to | ||
`cuda::std::thread_scope_system`. | ||
It has the same interface and semantics as [`cuda::std::atomic_ref`], with the | ||
following additional operations. | ||
This class additionally deviates from the standard by being backported to C++11. | ||
|
||
## Limitations | ||
|
||
`cuda::atomic_ref<T>` and `cuda::std::atomic_ref<T>` may only be instantiated with a T that are either 4 or 8 bytes. | ||
|
||
No object or subobject of an object referenced by an `atomic_ref` shall be concurrently referenced by any other `atomic_ref` that has a different `Scope`. | ||
|
||
For `cuda::atomic_ref<T>` and `cuda::std::atomic_ref<T>` the type `T` must satisfy the following: | ||
- `4 <= sizeof(T) <= 8`. | ||
- `T` must not have "padding bits", i.e., 'T`'s [object representation](https://en.cppreference.com/w/cpp/language/object#Object_representation_and_value_representation) must not have bits that do not participate in it's value representation. | ||
|
||
## Atomic Extrema Operations | ||
|
||
| [`cuda::atomic_ref::fetch_min`] | Atomically find the minimum of the stored value and a provided value. `(member function)` | | ||
| [`cuda::atomic_ref::fetch_max`] | Atomically find the maximum of the stored value and a provided value. `(member function)` | | ||
|
||
## Concurrency Restrictions | ||
|
||
See [`memory model`] documentation for general restrictions on atomicity. | ||
|
||
With CUDA Compute Capability 6 (Pascal), an object of type `atomic_ref` may not be | ||
used: | ||
- with a reference to an object with a automatic storage duration in a GPU thread, or | ||
- if `is_always_lock_free()` is `false`. | ||
|
||
For CUDA Compute Capability prior to 6 (Pascal), objects of type | ||
`cuda::atomic_ref` or [`cuda::std::atomic_ref`] may not be used. | ||
|
||
## Implementation-Defined Behavior | ||
|
||
For each type `T` and [`cuda::thread_scope`] `S`, the value of | ||
`cuda::atomic_ref<T, S>::is_always_lock_free()` and | ||
`cuda::std::atomic_ref<T>::is_always_lock_free()` is as follows: | ||
|
||
| Type `T` | [`cuda::thread_scope`] `S` | `cuda::atomic_ref<T, S>::is_always_lock_free()` | | ||
|----------|----------------------------|---------------------------------------------| | ||
| Any | Any | `sizeof(T) <= 8` | | ||
|
||
## Example | ||
|
||
```cuda | ||
#include <cuda/atomic> | ||
__global__ void example_kernel(int *gmem, int *pinned_mem) { | ||
// This atomic is suitable for all threads in the system. | ||
cuda::atomic_ref<int, cuda::thread_scope_system> a(pinned_mem); | ||
// This atomic has the same type as the previous one (`a`). | ||
cuda::atomic_ref<int> b(pinned_mem); | ||
// This atomic is suitable for all threads on the current processor (e.g. GPU). | ||
cuda::atomic_ref<int, cuda::thread_scope_device> c(gmem); | ||
__shared__ int shared_v; | ||
// This atomic is suitable for threads in the same thread block. | ||
cuda::atomic_ref<int, cuda::thread_scope_block> d(&shared); | ||
} | ||
``` | ||
|
||
[See it on Godbolt](https://godbolt.org/z/fr4K7ErEh){: .btn } | ||
|
||
|
||
[`cuda::thread_scope`]: ../thread_scopes.md | ||
[`memory model`]: ../memory_model.md | ||
|
||
[`cuda::atomic_thread_fence`]: ./atomic/atomic_thread_fence.md | ||
|
||
[`cuda::atomic_ref::fetch_min`]: ./atomic/fetch_min.md | ||
[`cuda::atomic_ref::fetch_max`]: ./atomic/fetch_max.md | ||
|
||
[`cuda::std::atomic_ref`]: https://en.cppreference.com/w/cpp/atomic/atomic_ref | ||
|
||
[atomics.types.int]: https://eel.is/c++draft/atomics.types.int | ||
[atomics.types.pointer]: https://eel.is/c++draft/atomics.types.pointer | ||
|
||
[`concurrentManagedAccess` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_116f9619ccc85e93bc456b8c69c80e78b | ||
[`hostNativeAtomicSupported` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1ef82fd7d1d0413c7d6f33287e5b6306f | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
docs/extended_api/synchronization_primitives/binary_semaphore.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
docs/extended_api/synchronization_primitives/counting_semaphore.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
--- | ||
grand_parent: Extended API | ||
parent: Synchronization Primitives | ||
nav_order: 1 | ||
nav_order: 2 | ||
--- | ||
|
||
# `cuda::latch` | ||
|