Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* remove HIP-CPU support * Resolve: IssueMove ROCPRIM_DETAIL_HIP_SYNC_AND_RETURN_ON_ERROR to seperate header file * rebase and add RETURN_ON_ERROR to the header * Added naive implementation for adjacent_find plus tests and benchmarks * Improved benchmark by only taking into account relevant processed elements * Use a faster reduction operation * Added block-reduction kernel with early exit * Improved test with random first pair * Get grid_size for maximum occupancy * Improved test coverage * Implement early exit with sequential blocks execution * Use a dynamic tile_id as in find_first_of for faster stable results * Added documentation for adjacent_find * Added tuning for adjacent_find * Modified tuning so that non-arithmetic types use default configs * Changed initialization mechanism of kernel's output element * Fixed tests from review comments - Simplified adjacent_find_impl functor definition - Added test for indirect_iterator * Simplified input transform logic * Added tuned configs * Removed duplicated ROCPRIM_DETAIL_HIP_SYNC_AND_RETURN_ON_ERROR * Resolve "Refactor benchmarks to use a byte-based size" * Added a rocprim::numeric_limits to support uint128 and int128 and changed all std::numeric_limits to test_utils::numeric_limits * Create generate_limit to ensure floating point custom types are handled correctly. * Add rocprim::numeric_limits to numeric_limits_custom_test_type * Expected output fix block_radix_sort test for custom_test_type<float> and custom_test_type<double> * Docs fix numeric_limits * Added numeric_limits to changelog * Added a rocprim::uint128_t and rocprim::int128_t * Implemented find_end with tests and benchmark * Updated find_end benchmark with generate_limits * Added different input pattern for benchmark and added multiple items per thread * Added different key_size to tests for find_end * Added shared memory kernel for find_end * Changed find_end to search with reverse iterator * Added tests for different compare function * Change benchmark to no longer early exit and choosing shared mem kernel as config variable * Extra check search kernel to prevent unnessary global search * Documentation for find_end * Changed find_end to make it easier to create search * Fix docs errors find_end * Changes for reviews find_end * Fix rebasing issues find_end * Added find_end to rocprim header * Fix build error after adding headers * Use byte-based size in benchmark * Remove double defines * Added search function with tests and benchmark * Fix documentation find_end and search * Add device_search to rocprim.hpp header * add device_ptr usility Authored-By: Cenxuan Tian <[email protected]> * replace high_resolution_clock with steady_clock Authorized-By: Cenxuan Tian <[email protected]> * properly namespace ROCPRIM_RETURN_ON_ERROR * Set c++ version to 17 and create warning * Fix no_discard warning c++17 * Set CI tests to c++14 * Build for both c++ 14 and 17 * Add large sizes test to device_radix_sort * Added more test coverage segmented_radix_sort * fix not working with const_iterators * fix: use bytes instead of size for scan tuning benchmarks * Resolve "Partial sort optimization: make use of radix sort" * doc: address the upper bound restrictions on Channels for device_histogram * doc: explicitly state that ActiveChannels is bounded by Channels * batch memcpy tests with random seed * follow clang format * add newline at the end * make rocprim::reverse_iterator align with that of std * minor change * add constexpr * adjust format * add warnings * adjust format * change the way of triggering warnings * adjust format * minor change * adjust format * clear warnings * adjust format * correct warning behaviours * adjust format * adjust format * update changelog and fix warning issue * fix ambiguous issue * move a CHANGELOG entry to Deprecations section * feat: add support for predicated flagged device select * feat: add tests (with large indices) for predicated flagged device select * feat: add config tuning and benchmarks for predicated and flagged device select * fix: add missing template parameter to partition-based autotune templates * Add tuned configs * Fix clang-format hang * Fix ambiguous error make_reverse_iterator * Resolve "Config tuning and dynamic dispatch for device merge" * add search_n algo * add test * Add google test for search_n & tested the functionality * Add benchmark * Add Doc & add custom type for benchmark * Remove unused variables * Add NonBlockStream support * Remove unused type alias * Refactor search_n for loop, &dit comments & * Add More tests & Fixed some bugs * Add more benckmarks * Add document * Refactor benchmarks * Replace another DOXYGEN_DOCUMENTATION_BUILD and some minor modifications * Fix build debug error * Optimize algo with large input * add impl2 * Optimize * Move hipMalloc vars to temp_memory * Rewrite benchmarks * Resolve * Fix bugs -- several occurrences of consecutive full blocks * Many modifications, fixed the bugs and edited the tests and benchmarks * Optimised the block_search_n_kernel * 2nd version search_n implementation for large input * Add thread level search_n algorithm * Add optimizations * Edit benchmarks * remove unused variables * remove unused variables and remove __restrict__ * fix the bug on windows * fix bug and modify benchmakrs and tests * fix bugs in benchmarks and search_n_impl * Oh yes * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Beatriz Navidad Vilches <[email protected]> * apply some suggestions * edit doc * replace search_n_min_kernal by rocprim:reduce * fixed some benchmarks bugs * remove graph support * resolve not compile on win * Add graph support and modified the design a little * resolve test fail on windws * fix gfx960 benchmark dead lock * Add device_search_n to rocprim.hpp * replace HIP_CHECK by ROCPRIM_RETURN_ON_ERROR * fix: fix doxygen error due to __launch_bounds__ macro * Implement 6.3 hotfixes for added/modified tests * Workaround CI memory usage limit * Reduce memory usage even more --------- Co-authored-by: Robin Voetter <[email protected]> Co-authored-by: Cenxuan Tian <[email protected]> Co-authored-by: Milo Lurati <[email protected]> Co-authored-by: Nick Breed <[email protected]> Co-authored-by: Bence Parajdi <[email protected]> Co-authored-by: Yung-sheng Tu <[email protected]>
- Loading branch information