Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mdspan cache policy accessors #2487

Closed
wants to merge 27 commits into from

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Sep 30, 2024

closes #2472

Add custom CUDA mdspan accessors to enable cache operators.
The PR covers the following features:

  • A cache_policy_accessor for load and store operation
  • A cache_policy_accessor for load-only operation
  • A accessor_reference for dispatching load and store operation in different ways
  • Low-level memory accesses rely on cub::ThreadLoad and cub::ThreadStore (related issue [FEA]: Improve and cleanup ThreadLoad #2486 for improving the two methods)

(names to finalize later)

libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved
libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved
libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved

_LIBCUDACXX_BEGIN_NAMESPACE_CUDA

enum class EvictionPolicy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a publicly facing enumeration?

If so we should document it


accessor_reference(accessor_reference&&) = delete;

_CCCL_HIDE_FROM_ABI _CCCL_DEVICE _CCCL_FORCEINLINE accessor_reference(const accessor_reference&) = default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of putting _CCCL_FORCEINLINE everywhere

I was investigating making it part of _CCCL_HIDE_FROM_ABI but that lead to a ton of compiler issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for small functions, especially in CUDA

libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved
libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved
Comment on lines 113 to 116
static_assert(!::cuda::std::is_array<_ElementType>::value,
"cache_policy_accessor: template argument may not be an array type");
static_assert(!::cuda::std::is_abstract<_ElementType>::value,
"cache_policy_accessor: template argument may not be an abstract class");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why those constraints are not in the other case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are on both version (const/non-const) of cache_policy_accessor

libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved
libcudacxx/include/cuda/__mdspan/optimized_accessors.h Outdated Show resolved Hide resolved
Copy link

copy-pr-bot bot commented Oct 2, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

miscco and others added 13 commits October 2, 2024 19:09
…VIDIA#2483)

* Fix `common_type` specialization for extended floating point types

The machinery we had in place was not really suited to specialize `common_type` because it would take precendence over the actual implementation of `common_type`

In that case, we only specialized `common_type<__half, __half>` but not `common_type<__half, __half&>` and so on.

This shows how brittle the whole thing is and that it is not extensible.

Rather than putting another bandaid over it, add a proper 5th step in the common_type detection that properly treats combinations of an extended floating point type with an arithmetic type.

Allowing arithmetic types it necessary to keep machinery like `pow(__half, 2)` working.

Fixes [BUG]: `is_common_type`  trait is broken when mixing rvalue references NVIDIA#2419

* Work around MSVC declval bug
There is an incredible compiler bug reported in nvbug4867473 where the use of system header changes the way some types are instantiated.

The culprit seems to be that within a system header the compiler accepts narrowing conversions that it should not accept

Work around it by moving __is_non_narrowing_convertible to its own header that is included before we define the system header machinery
Signed-off-by: fbusato <[email protected]>
Signed-off-by: fbusato <[email protected]>
…erty (NVIDIA#2489)

Currently we implicitly assumed that any resource that had no execution space property was host accessible.

However, that is not a good design, as it provides a source of surprise and numerous challenges with proper type matching down the road.

So rather than implicitly assuming that something is host accessible, we require the user to always provide at least one execution space property.
* Move builtin detection to its own file

* Try to reenable more builtins

* Address review comments
This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI
Signed-off-by: fbusato <[email protected]>
Signed-off-by: fbusato <[email protected]>
@fbusato fbusato force-pushed the mdspan-cache-operator-accessors branch from 92b9963 to b72b013 Compare October 2, 2024 19:10
@fbusato fbusato marked this pull request as ready for review October 14, 2024 22:32
@fbusato fbusato requested review from a team as code owners October 14, 2024 22:32
@fbusato fbusato requested review from wmaxey and elstehle October 14, 2024 22:32
@fbusato fbusato self-assigned this Nov 6, 2024
@fbusato fbusato closed this Jan 15, 2025
@fbusato fbusato deleted the mdspan-cache-operator-accessors branch January 15, 2025 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Provide cuda:: optimized accessors for mdspan
3 participants