rocBLAS-2.24.0 for ROCm 3.6.0
New Features
- Improvements to User Guide and Design Document
- L1 dot function optimized to utilize shuffle instructions ( improvements on bf16, f16, f32 data types )
- L1 dot function added x dot x optimized kernel
- Standardization of L1 rocblas-bench to use device pointer mode to focus on GPU memory bandwidth
- Adjustments for hipcc (hip-clang) compiler as standard build compiler and Centos8 support
- Added Fortran interface for all rocBLAS functions
Known Issues
- None