`mdspan` cache policy accessors #2487

fbusato · 2024-09-30T23:23:48Z

closes #2472

Add custom CUDA mdspan accessors to enable cache operators.
The PR covers the following features:

A cache_policy_accessor for load and store operation
A cache_policy_accessor for load-only operation
A accessor_reference for dispatching load and store operation in different ways
Low-level memory accesses rely on cub::ThreadLoad and cub::ThreadStore (related issue [FEA]: Improve and cleanup ThreadLoad #2486 for improving the two methods)

(names to finalize later)

libcudacxx/include/cuda/__mdspan/optimized_accessors.h

miscco · 2024-10-01T06:19:15Z

libcudacxx/include/cuda/__mdspan/optimized_accessors.h

+
+_LIBCUDACXX_BEGIN_NAMESPACE_CUDA
+
+enum class EvictionPolicy


Is that a publicly facing enumeration?

If so we should document it

miscco · 2024-10-01T06:20:42Z

libcudacxx/include/cuda/__mdspan/optimized_accessors.h

+
+  accessor_reference(accessor_reference&&) = delete;
+
+  _CCCL_HIDE_FROM_ABI _CCCL_DEVICE _CCCL_FORCEINLINE accessor_reference(const accessor_reference&) = default;


I am not a big fan of putting _CCCL_FORCEINLINE everywhere

I was investigating making it part of _CCCL_HIDE_FROM_ABI but that lead to a ton of compiler issues

I think it makes sense for small functions, especially in CUDA

libcudacxx/include/cuda/__mdspan/optimized_accessors.h

miscco · 2024-10-01T06:22:55Z

libcudacxx/include/cuda/__mdspan/optimized_accessors.h

+  static_assert(!::cuda::std::is_array<_ElementType>::value,
+                "cache_policy_accessor: template argument may not be an array type");
+  static_assert(!::cuda::std::is_abstract<_ElementType>::value,
+                "cache_policy_accessor: template argument may not be an abstract class");


I am wondering why those constraints are not in the other case

they are on both version (const/non-const) of cache_policy_accessor

libcudacxx/include/cuda/__mdspan/optimized_accessors.h

Co-authored-by: Michael Schellenberger Costa <[email protected]>

copy-pr-bot · 2024-10-02T00:25:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…VIDIA#2483) * Fix `common_type` specialization for extended floating point types The machinery we had in place was not really suited to specialize `common_type` because it would take precendence over the actual implementation of `common_type` In that case, we only specialized `common_type<__half, __half>` but not `common_type<__half, __half&>` and so on. This shows how brittle the whole thing is and that it is not extensible. Rather than putting another bandaid over it, add a proper 5th step in the common_type detection that properly treats combinations of an extended floating point type with an arithmetic type. Allowing arithmetic types it necessary to keep machinery like `pow(__half, 2)` working. Fixes [BUG]: `is_common_type` trait is broken when mixing rvalue references NVIDIA#2419 * Work around MSVC declval bug

There is an incredible compiler bug reported in nvbug4867473 where the use of system header changes the way some types are instantiated. The culprit seems to be that within a system header the compiler accepts narrowing conversions that it should not accept Work around it by moving __is_non_narrowing_convertible to its own header that is included before we define the system header machinery

Signed-off-by: fbusato <[email protected]>

…erty (NVIDIA#2489) Currently we implicitly assumed that any resource that had no execution space property was host accessible. However, that is not a good design, as it provides a source of surprise and numerous challenges with proper type matching down the road. So rather than implicitly assuming that something is host accessible, we require the user to always provide at least one execution space property.

* Move builtin detection to its own file * Try to reenable more builtins * Address review comments

This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI

Signed-off-by: fbusato <[email protected]>

fbusato and others added 2 commits September 30, 2024 23:09

add cache_policy_accessor and accessor_reference

5f666a8

Merge branch 'main' into mdspan-cache-operator-accessors

dbb64b9

miscco reviewed Oct 1, 2024

View reviewed changes

fbusato and others added 2 commits October 1, 2024 11:24

Update libcudacxx/include/cuda/__mdspan/optimized_accessors.h

5de35b9

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Update libcudacxx/include/cuda/__mdspan/optimized_accessors.h

4596c30

Co-authored-by: Michael Schellenberger Costa <[email protected]>

miscco and others added 13 commits October 2, 2024 19:09

Implement some CUDA API calls for async_memory_pool (NVIDIA#2455)

7523e2a

Move cudax example project to CCCL project examples. (NVIDIA#2462)

ba2013d

add concept-like traits and formatting

9096974

Signed-off-by: fbusato <[email protected]>

renamed to cached accessors

17621e2

Signed-off-by: fbusato <[email protected]>

header guard renaming

0bcfc4a

Signed-off-by: fbusato <[email protected]>

headers ordering

b8141e6

Signed-off-by: fbusato <[email protected]>

Rework builtin handling (NVIDIA#2461)

caff9bc

* Move builtin detection to its own file * Try to reenable more builtins * Address review comments

Disable execution checks for std::equal (NVIDIA#2491)

26a452d

This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI

formatting

6a9a07e

Signed-off-by: fbusato <[email protected]>

formatting2

b72b013

Signed-off-by: fbusato <[email protected]>

fbusato force-pushed the mdspan-cache-operator-accessors branch from 92b9963 to b72b013 Compare October 2, 2024 19:10

fbusato added 10 commits October 3, 2024 18:12

add annotated_ptr properties

82eb52a

accessor_with_properties prototype

d43b04d

cleanup

2ee0060

code simplification

8c2d938

memory consistency scope and named accessors

dfe1d21

renamed main file

bb7107b

documentation

bb3f24b

documentation improvements

3370a67

documentation refinement

1ce1afb

documentation minor fixes

12b77c1

gevtushenko mentioned this pull request Oct 10, 2024

[FEA]: Introduce cache-modified input iterator into cuda.parallel #2536

Closed

1 task

fbusato marked this pull request as ready for review October 14, 2024 22:32

fbusato requested review from a team as code owners October 14, 2024 22:32

fbusato requested review from wmaxey and elstehle October 14, 2024 22:32

fbusato self-assigned this Nov 6, 2024

fbusato closed this Jan 15, 2025

fbusato deleted the mdspan-cache-operator-accessors branch January 15, 2025 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`mdspan` cache policy accessors #2487

`mdspan` cache policy accessors #2487

fbusato commented Sep 30, 2024 •

edited by jrhemstad

Loading

miscco Oct 1, 2024

miscco Oct 1, 2024

fbusato Oct 2, 2024

miscco Oct 1, 2024

fbusato Oct 2, 2024

copy-pr-bot bot commented Oct 2, 2024


		accessor_reference(accessor_reference&&) = delete;

		_CCCL_HIDE_FROM_ABI _CCCL_DEVICE _CCCL_FORCEINLINE accessor_reference(const accessor_reference&) = default;


		_LIBCUDACXX_BEGIN_NAMESPACE_CUDA

		enum class EvictionPolicy

mdspan cache policy accessors #2487

mdspan cache policy accessors #2487

Conversation

fbusato commented Sep 30, 2024 • edited by jrhemstad Loading

miscco Oct 1, 2024

Choose a reason for hiding this comment

miscco Oct 1, 2024

Choose a reason for hiding this comment

fbusato Oct 2, 2024

Choose a reason for hiding this comment

miscco Oct 1, 2024

Choose a reason for hiding this comment

fbusato Oct 2, 2024

Choose a reason for hiding this comment

copy-pr-bot bot commented Oct 2, 2024

`mdspan` cache policy accessors #2487

`mdspan` cache policy accessors #2487

fbusato commented Sep 30, 2024 •

edited by jrhemstad

Loading