Skip to content

rocBLAS 2.45.0 for ROCm 5.3.0

Compare
Choose a tag to compare
@lawruble13 lawruble13 released this 30 Sep 19:25
7294a70

Added

  • install.sh option --upgrade_tensile_venv_pip to upgrade Pip in Tensile Virtual Environment. The corresponding CMake option is TENSILE_VENV_UPGRADE_PIP.
  • install.sh option --relocatable or -r adds rpath and removes ldconf entry on rocBLAS build.
  • install.sh option --lazy-library-loading to enable on-demand loading of tensile library files at runtime to speedup rocBLAS initialization.
  • Support for RHEL9 and CS9.
  • Added Numerical checking routine for symmetric, Hermitian, and triangular matrices, so that they could be checked for any numerical abnormalities such as NaN, Zero, infinity and denormal value.

Optimizations

  • trmm_outofplace performance improvements for all sizes and data types using block-recursive algorithm.
  • herkx performance improvements for all sizes and data types using block-recursive algorithm.
  • syrk/herk performance improvements by utilising optimised syrkx/herkx code.
  • symm/hemm performance improvements for all sizes and datatypes using block-recursive algorithm.

Changed

  • Unifying library logic file names: affects HBH (->HHS_BH), BBH (->BBS_BH), 4xi8BH (->4xi8II_BH). All HPA types are using the new naming convention now.
  • Level 3 function argument checking when the handle is set to rocblas_pointer_mode_host now returns the status of rocblas_status_invalid_pointer only for pointers that must be dereferenced based on the alpha and beta argument values. With handle mode rocblas_pointer_mode_device only pointers that are always dereferenced regardless of alpha and beta values are checked and so may lead to a return status of rocblas_status_invalid_pointer. This improves consistency with legacy BLAS behaviour.
  • Level 1, 2, and 3 function argument checking for enums is now more rigorously matching legacy BLAS so returns rocblas_status_invalid_value if arguments do not match the accepted subset.
  • Add quick-return for internal trmm and gemm template functions.
  • Moved function block sizes to a shared header file.
  • Level 1, 2, and 3 functions use rocblas_stride datatype for offset.
  • Modified the matrix and vector memory allocation in our test infrastructure for all Level 1, 2, 3 and BLAS_EX functions.
  • Added specific initialization for symmetric, Hermitian, and triangular matrix types in our test infrastructure.
  • Added NaN tests to the test infrastructure for the rest of Level 3, BLAS_EX functions.

Fixed

  • Improved logic to #include vs <experimental/filesystem>.
  • install.sh -s option to build rocblas as a static library.
  • dot function now sets the device results asynchronously for N <= 0

Deprecated

  • is_complex helper is now deprecated. Use rocblas_is_complex instead.
  • The enum truncate_t and the value truncate is now deprecated and will removed from the ROCm release 6.0. It is replaced by rocblas_truncate_t and rocblas_truncate, respectively. The new enum rocblas_truncate_t and the value rocblas_truncate could be used from this ROCm release for an easy transition.

Removed

  • install.sh options --hip-clang , --no-hip-clang, --merge-files, --no-merge-files are removed.