Add support for large num_items
to DeviceReduce::{ArgMin,ArgMax}
#2515
Labels
2.8.0
target for 2.8.0 release
num_items
to DeviceReduce::{ArgMin,ArgMax}
#2515
DeviceReduce::Arg{Min,Max}
does not yet support larger-thanINT_MAX
number of items. Simply instantiating the kernel with a wider offset type degrades performance by as much as38%
.We want to mitigate the performance downside and also address #414 (comment)
Performance numbers by simply instantiating
DispatchReduce
with different offset types:DeviceReduce::Arg{Min,Max}
DeviceReduce::Arg{Min,Max}
DeviceReduce::Arg{Min,Max}
➡ "streaming" reductionDeviceReduce::Arg{Min,Max}
DeviceReduce::Arg{Min,Max}
Performance comparison for
old.main.i64 offset type
vs. streaming approach:Summary for all rows
Performance comparison for `old.main.i64` vs. streaming approach
Performance comparison for
old.main.i32
vs. streaming approachSummary for rows where Elements{io} is 2^28
Performance comparison for `old.main.i32` vs. streaming approach
The text was updated successfully, but these errors were encountered: