[FEA]: Make DeviceMergeSort
also consider ValueT
for scaling down ITEMS_PER_THREAD
in its policy
#1141
Labels
feature request
New feature or request.
Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
In most of our tuning policies we scale down the
ITEMS_PER_THREAD
the larger the type is that the algorithm works on. The main motivation is to keep shared memory and/or register usage in our kernels somewhat constant despite varying data type sizes.In
DeviceMergeSort
, we scale downITEMS_PER_THREAD
the largersizeof(KeyT)
is. However, when sorting pairs, we do not considerValueT
for reducingITEMS_PER_THREAD
.As a result, when, for instance, using a 128-bit value type along with a 32-bit key type, we need to use virtual shared memory (in our current implementation) or use the fallback policy (once, #1117 is merged).
Describe the solution you'd like
I think should pursue a similar approach to
DeviceRadixSort
, where we consider the larger of the two types (excerpt from our radix sort policy) for scaling downITEMS_PER_THREAD
. This would be reflective of the effective shared memory requirements in our kernels.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: