Skip to content

Commit

Permalink
6.4 version fix and mergeback 6.3 hotfixes (#650)
Browse files Browse the repository at this point in the history
* Remove website URL from comments (#600)

Referencing or using code from some websites is prohibited in this repository.
This change removes an informational reference in the comments.

* Fix rare memory access faults when using internal serial merge (#597)

* test: add tests for internal serial merge function

* refactor(detail/merge_path.hpp): removed code duplication

* fix(detail/merge_path.hpp): stricter boundary checking in serial merge

* fix(detail/block_sort_merge.hpp): fix missing block-wide sync

During a previous refactor, serial_merge does no longer do a block sync. This has now been re-added.

* feat: add unsafe variant of serial merge

* fix: use bounded version for serial merge to fix rare page faults

* test(test_internal_merge_path): clean up internal merge path tests

* style: standardize range_t<> construction

* fix(detail/merge_path.hpp): fix 'range_t<>::count1()' and 'range_t<>::count2()' return types to be same as encapsulated type

* perf(detail/merge_path.hpp): use const ref in function parameters

* refactor(detail/merge_path.hpp): replace redundant use of 'OffsetT' with 'unsigned int'

* chore: update changelog

* fix: restore missing thread sync

This got removed during a rebase.

* Add gfx1151 target (#601) (#603)

Co-authored-by: Stanley Tsang <[email protected]>

* Merge back 6.2 hotfixes (#607) (#620)

* Update dependency names for static builds (#557)

This also removes the line setting `BUILD_SHARED_LIBS` to `ON`, which was previously required to get the correctly named packages when not specifically compiling for a static build. Updates to the ROCmCMakeBuildTools (rocm-cmake) should mean this is no longer necessary.

* Fix BUILD_SHARED_LIBS for packaging (#558)

* Fix the dependencies of the static packages (#563)

* cmake: don't set CMAKE_C_COMPILER, as rocPRIM is a CXX project (#567)

* add developer guidelines (#555) (#574)



* Update Read the Docs config to Python 3.10 and latest rocm-docs-core (#564) (#579)

* Cherry-pick: Optimize block_reduce_warp_reduce when block size is the same as warp size (#599)

* Optimize block_reduce_warp_reduce when block size == warp size

* Make conditional constexpr

* Fix conflict in concepts.rst

---------

Co-authored-by: Lauren Wrubleski <[email protected]>
Co-authored-by: Steve Leung <[email protected]>
Co-authored-by: randyh62 <[email protected]>
Co-authored-by: Nol Moonen <[email protected]>
Co-authored-by: Sam Wu <[email protected]>

* Changed precondition for edge case in serial_merge to prevent assertion error (#622)

* added std::min to ensure no out of bound acess

* fixed typo keys->keys1

* updated changelog

* reverted std::min

* implemented suggested logic

* edited to conform to standards (#618)

* Memory leak fix for multiple rocPRIM unit  tests (#614)

* fixed mem leak in test_config_dispatch.cpp

* added missing hip free for method==4 in test_block_scan.kernels

* added graphHelpeer class that does not cause memory leak due to using hipGraphCreate

* replaced old hipGraph helpers with new class in device_bin_search

* changed HIP_CHECK_NON_VOID to HIP_CHECK

* fixed mem leak in device_bin_search

* added additional functions

* changed out old calls to hipGraphCrete to new GraphHelper class

* added missing stream sync for hipgrag_algs

* n

* added missing hipFree and HIP_CHECK for lookback_reproducibility

* added missing hipFree in test_discard_iterator

* fixed test failures

* removed extra hipFree

* removed unused variables

* updated change log

* removed redundant function

---------

Co-authored-by: Your Name <[email protected]>
Co-authored-by: root <[email protected]>

* updated the changelog for 6.3 (#632)

* updated default gpu to include gfx12 and gfx1151

* updated changelog

* fixed minor grammar mistake in changelog

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <[email protected]>

* Remove gfx940,gfx941 targets (#639)

* Update rocPRIM version

---------

Co-authored-by: Wayne Franz <[email protected]>
Co-authored-by: Nara <[email protected]>
Co-authored-by: amd-garydeng <[email protected]>
Co-authored-by: Lauren Wrubleski <[email protected]>
Co-authored-by: Steve Leung <[email protected]>
Co-authored-by: randyh62 <[email protected]>
Co-authored-by: Nol Moonen <[email protected]>
Co-authored-by: Sam Wu <[email protected]>
Co-authored-by: Di Nguyen <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>
Co-authored-by: Your Name <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Val Movsik <[email protected]>
  • Loading branch information
14 people authored Nov 21, 2024
1 parent 4008834 commit ffd6685
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 3 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projects/rocPRIM/en/latest/](https://rocm.docs.amd.com/projects/rocPRIM/en/latest/).

## (Unreleased) rocPRIM 3.4.0 for ROCm 6.4.0
## rocPRIM 3.4.0 for ROCm 6.4.0

### Added

Expand Down Expand Up @@ -42,10 +42,13 @@ Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projec
### Upcoming changes
* Using the initialisation constructor of `rocprim::reverse_iterator` will throw a deprecation warning. It will be marked as explicit in the next major release.

* Using the initialisation constructor of rocprim::reverse_iterator will throw a deprecation warning. It will be marked as explicit in the next major release.

## rocPRIM 3.3.0 for ROCm 6.3.0

### Added

* Changed the default value of `rmake.py -a` to `default_gpus`. This is equivalent to `gfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201`.
* The `--test smoke` option has been added to `rtest.py`. When `rtest.py` is called with this option it runs a subset of tests such that the total test time is 5 minutes. Use `python3 ./rtest.py --test smoke` or `python3 ./rtest.py -t smoke` to run the smoke test.
* The `--seed` option has been added to `run_benchmarks.py`. The `--seed` option specifies a seed for the generation of random inputs. When the option is omitted, the default behavior is to use a random seed for each benchmark measurement.
* Added configuration autotuning to device partition (`rocprim::partition`, `rocprim::partition_two_way`, and `rocprim::partition_three_way`), to device select (`rocprim::select`, `rocprim::unique`, and `rocprim::unique_by_key`), and to device reduce by key (`rocprim::reduce_by_key`) to improve performance on selected architectures.
Expand All @@ -67,6 +70,7 @@ Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projec

### Resolved issues

* Fixed an issue in `rmake.py` where the list storing cmake options would contain individual characters instead of a full string of options.
* Resolved an issue in `rtest.py` where it crashed if the `build` folder was created without `release` or `debug` subdirectories.
* Resolved an issue with `rtest.py` on Windows where passing an absolute path to `--install_dir` caused a `FileNotFound` error.
* rocPRIM functions are no longer forcefully inlined on Windows. This significantly reduces the build
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ if(GPU_TARGETS STREQUAL "all")
)
else()
rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS
TARGETS "gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201"
TARGETS "gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201"
)
endif()

Expand Down
5 changes: 4 additions & 1 deletion rmake.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ def parse_args():
parser = argparse.ArgumentParser(description="""
Checks build arguments
""")

default_gpus = 'gfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'

parser.add_argument('-g', '--debug', required=False, default=False, action='store_true',
help='Generate Debug build (default: False)')
parser.add_argument( '--build_dir', type=str, required=False, default="build",
Expand All @@ -37,7 +40,7 @@ def parse_args():
help='Install after build (default: False)')
parser.add_argument( '--cmake-darg', required=False, dest='cmake_dargs', action='append', default=[],
help='List of additional cmake defines for builds (e.g. CMAKE_CXX_COMPILER_LAUNCHER=ccache)')
parser.add_argument('-a', '--architecture', dest='gpu_architecture', required=False, default="gfx906;gfx1030;gfx1100;gfx1101;gfx1102", #:sramecc+:xnack-" ) #gfx1030" ) #gfx906" ) # gfx1030" )
parser.add_argument('-a', '--architecture', dest='gpu_architecture', required=False, default=default_gpus, #:sramecc+:xnack-" ) #gfx1030" ) #gfx906" ) # gfx1030" )
help='Set GPU architectures, e.g. all, gfx000, gfx803, gfx906:xnack-;gfx1030;gfx1100 (optional, default: all)')
parser.add_argument('-v', '--verbose', required=False, default=False, action='store_true',
help='Verbose build (default: False)')
Expand Down

0 comments on commit ffd6685

Please sign in to comment.