-
Notifications
You must be signed in to change notification settings - Fork 69
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Do not call fence in the wait loop * Use __hip_atomic_load/store instead of atomicExch/atomicAdd atomicExch is compiled to global_atomic_swap even when the results is not used. * Use faster fences in lookback algorithms on gfx94* This version is specific for devices with slow __threadfence ("agent" fence which does L2 cache flushing and invalidation). Fences with "workgroup" scope are used instead to ensure ordering only but not coherence, they do not flush and invalidate cache. Global coherence of prefixes_*_values is ensured by atomic_load/atomic_store that bypass cache. * Rename ROCPRIM_DETAIL_LOOKBACK_SCAN_STATE_WITHOUT_SLOW_FENCES from ROCPRIM_LOOKBACK_WITHOUT_SLOW_FENCES. This is more verbose to communicates that it is implementation detail It uses 0 and 1 instead of the presence of the macro now, and won't be overriden if set by a developer on the command line. * Add WITHOUT_SLOW_FENCES version to lookback_scan_state::get_complete_value * refactor: lookback_scan_state WITHOUT_SLOW_FENCES misc changes - use sizeof(variable) - use auto* and const auto* instead of just auto - use void* instead of char* to avoid yet another cast - make the atomic order fence a separate function and add docs & warning * fix: Restore removed interfaces of lookback_scan_state Even though these are in the detail namespace and as such explicitly not meant for usage by users, some projects did start depending on them. The interfaces for these are slightly broken and rocPRIM developers discourage any users from using them (or the newer interfaces for that matter) because they are implementation details. No further guarantees are provided for these APIs. In the future a public interface is planned for lookback_scan_state as we have recognized that this is a useful primitive, and it's unreasonable to expect users to implement for themselves. * refactor: rename __builtin_amdgcn_fence as atomic_fence_acquire_order_only --------- Co-authored-by: Anton Gorenko <[email protected]>
- Loading branch information
Showing
6 changed files
with
222 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.