-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mdspan
cache policy accessors
#2487
Conversation
|
||
_LIBCUDACXX_BEGIN_NAMESPACE_CUDA | ||
|
||
enum class EvictionPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a publicly facing enumeration?
If so we should document it
|
||
accessor_reference(accessor_reference&&) = delete; | ||
|
||
_CCCL_HIDE_FROM_ABI _CCCL_DEVICE _CCCL_FORCEINLINE accessor_reference(const accessor_reference&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a big fan of putting _CCCL_FORCEINLINE
everywhere
I was investigating making it part of _CCCL_HIDE_FROM_ABI
but that lead to a ton of compiler issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense for small functions, especially in CUDA
static_assert(!::cuda::std::is_array<_ElementType>::value, | ||
"cache_policy_accessor: template argument may not be an array type"); | ||
static_assert(!::cuda::std::is_abstract<_ElementType>::value, | ||
"cache_policy_accessor: template argument may not be an abstract class"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why those constraints are not in the other case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are on both version (const/non-const) of cache_policy_accessor
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
…VIDIA#2483) * Fix `common_type` specialization for extended floating point types The machinery we had in place was not really suited to specialize `common_type` because it would take precendence over the actual implementation of `common_type` In that case, we only specialized `common_type<__half, __half>` but not `common_type<__half, __half&>` and so on. This shows how brittle the whole thing is and that it is not extensible. Rather than putting another bandaid over it, add a proper 5th step in the common_type detection that properly treats combinations of an extended floating point type with an arithmetic type. Allowing arithmetic types it necessary to keep machinery like `pow(__half, 2)` working. Fixes [BUG]: `is_common_type` trait is broken when mixing rvalue references NVIDIA#2419 * Work around MSVC declval bug
There is an incredible compiler bug reported in nvbug4867473 where the use of system header changes the way some types are instantiated. The culprit seems to be that within a system header the compiler accepts narrowing conversions that it should not accept Work around it by moving __is_non_narrowing_convertible to its own header that is included before we define the system header machinery
Signed-off-by: fbusato <[email protected]>
Signed-off-by: fbusato <[email protected]>
Signed-off-by: fbusato <[email protected]>
Signed-off-by: fbusato <[email protected]>
…erty (NVIDIA#2489) Currently we implicitly assumed that any resource that had no execution space property was host accessible. However, that is not a good design, as it provides a source of surprise and numerous challenges with proper type matching down the road. So rather than implicitly assuming that something is host accessible, we require the user to always provide at least one execution space property.
* Move builtin detection to its own file * Try to reenable more builtins * Address review comments
This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI
Signed-off-by: fbusato <[email protected]>
Signed-off-by: fbusato <[email protected]>
92b9963
to
b72b013
Compare
closes #2472
Add custom CUDA
mdspan
accessors to enable cache operators.The PR covers the following features:
cache_policy_accessor
for load and store operationcache_policy_accessor
for load-only operationaccessor_reference
for dispatching load and store operation in different wayscub::ThreadLoad
andcub::ThreadStore
(related issue [FEA]: Improve and cleanupThreadLoad
#2486 for improving the two methods)(names to finalize later)