-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix block reduce alignment #1233
Fix block reduce alignment #1233
Conversation
Performance is equivalent after the change:
|
template <::cuda::std::size_t Alignment> | ||
struct max_alignment_t<Alignment> | ||
{ | ||
constexpr static ::cuda::std::size_t value = Alignment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
we usually go with the ordering of static constexpr
because static is the more relevant information here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: clang-format can handle such orderings using the QualifierOrder style option, which we seem to not set in our .clang-format
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened: #1748
Addressed by nvbug 4428282 |
Description
closes #1229
Vectorized loads of temporary storage in block reduce, lead to unaligned access. This PR marks temporary storage for block reduce as
struct alignas(detail::max_alignment_t<16, T, WarpReduceStorage>::value) _TempStorage
. This and reordering of member variables in temporary storage, ensures that warp aggregate loads can be vectorized.Note that other parts of CUB rely on
__align__(16)
. This is potentially problematic for custom types with explicitly specified alignment. I'll create a separate issue for unifying the approach later.Checklist