Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Make DeviceMergeSort also consider ValueT for scaling down ITEMS_PER_THREAD in its policy #1141

Open
1 task done
elstehle opened this issue Nov 22, 2023 · 0 comments
Labels
feature request New feature or request.

Comments

@elstehle
Copy link
Collaborator

Is this a duplicate?

Area

CUB

Is your feature request related to a problem? Please describe.

In most of our tuning policies we scale down the ITEMS_PER_THREAD the larger the type is that the algorithm works on. The main motivation is to keep shared memory and/or register usage in our kernels somewhat constant despite varying data type sizes.

In DeviceMergeSort, we scale down ITEMS_PER_THREAD the larger sizeof(KeyT) is. However, when sorting pairs, we do not consider ValueT for reducing ITEMS_PER_THREAD.

As a result, when, for instance, using a 128-bit value type along with a 32-bit key type, we need to use virtual shared memory (in our current implementation) or use the fallback policy (once, #1117 is merged).

AgentMergeSortPolicy<256,
                     Nominal4BItemsToItems<KeyT>(11),
                     cub::BLOCK_LOAD_WARP_TRANSPOSE,
                     cub::LOAD_LDG,
                     cub::BLOCK_STORE_WARP_TRANSPOSE>;

Describe the solution you'd like

I think should pursue a similar approach to DeviceRadixSort, where we consider the larger of the two types (excerpt from our radix sort policy) for scaling down ITEMS_PER_THREAD. This would be reflective of the effective shared memory requirements in our kernels.

// Dominant-sized key/value type
using DominantT = cub::detail::conditional_t<(sizeof(ValueT) > sizeof(KeyT)), ValueT, KeyT>;

AgentMergeSortPolicy<256,
                     Nominal4BItemsToItems<DominantT>(11),
                     cub::BLOCK_LOAD_WARP_TRANSPOSE,
                     cub::LOAD_LDG,
                     cub::BLOCK_STORE_WARP_TRANSPOSE>;

Describe alternatives you've considered

No response

Additional context

No response

@elstehle elstehle added the feature request New feature or request. label Nov 22, 2023
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Status: Todo
Development

No branches or pull requests

1 participant