Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ROCm / rocBLAS Public

Notifications You must be signed in to change notification settings
Fork 167
Star 348

Code
Issues 6
Pull requests 2
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Releases: ROCm/rocBLAS

Releases · ROCm/rocBLAS

rocBLAS-2.32.0 for ROCm 4.0.0

18 Dec 15:22

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocBLAS-2.32.0 for ROCm 4.0.0

New Features

No new features

Known Issues

None

Assets 2

Loading

All reactions

rocBLAS-2.32.0 for ROCm 3.10.0

30 Nov 17:02

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocBLAS-2.32.0 for ROCm 3.10.0

New Features

Improved performance of gemm_batched for NN, general m, n, k, small m, n, k

Known Issues

None

Assets 2

Loading

All reactions

rocBLAS-2.30.0 for ROCm 3.9.0

27 Oct 20:13

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocBLAS-2.30.0 for ROCm 3.9.0

New Features

Slight improvements to FP16 Megatron BERT performance on MI50
Improvements to FP16 Transformer performance on MI50
Slight improvements to FP32 Transformer performance on MI50

Known Issues

None

Assets 2

Loading

All reactions

rocBLAS-2.28.0 for ROCm 3.8.0

18 Sep 21:32

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocBLAS-2.28.0 for ROCm 3.8.0

New Features

atomics_mode functions added:
- rocblas_status rocblas_set_atomics_mode(rocblas_atomics_mode mode);
- rocblas_status rocblas_get_atomics_mode(rocblas_atomics_mode mode);
added enum rocblas_atomics_mode. It can have two values:
rocblas_atomics_allowed
rocblas_atomics_not_allowed
The default is rocblas_atomics_not_allowed
function rocblas_Xdgmm algorithm corrected and incx=0 support added
Additional dependencies needed:
rocblas-tensile internal component requires msgpack instead of LLVM
Moved the following files from /opt/rocm/include to /opt/rocm/include/internal:
rocblas-auxillary.h
rocblas-complex-types.h
rocblas-functions.h
rocblas-types.h
rocblas-version.h
rocblas_bfloat16.h
These files should NOT be included directly as this may lead to errors. Instead, /opt/rocm/include/rocblas.h should be included directly. /opt/rocm/include/rocblas_module.f90 can also be direcly used.

Known Issues

None

Assets 2

Loading

All reactions

rocBLAS-2.26.0 for ROCm 3.7.0

15 Aug 04:26

saadrahim

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocBLAS-2.26.0 for ROCm 3.7.0

New Features

Improvements to User Guide and Design Document
L1 dot function optimized to utilize shuffle instructions ( improvements on bf16, f16, f32 data types )
L1 dot function added x dot x optimized kernel
Standardization of L1 rocblas-bench to use device pointer mode to focus on GPU memory bandwidth
Adjustments for hipcc (hip-clang) compiler as standard build compiler and Centos8 support
Added Fortran interface for all rocBLAS functions
Improvements to rocblas_Xgemm_batched performance for small m, n, k.
Improvements to rocblas_Xgemv_batched and rocblas_Xgemv_strided_batched performance for small m (QMCPACK use).
Improvements to rocblas_Xdot (batched and non-batched) performance when both incx and incy are 1
Improvements to FP32 ONNX BERT performance for MI50
Significant improvements to FP32 Resnext, Inception Convolution performance for gfx908
Slight improvements to FP32 DLRM Terabyte performance for gfx908
Significant improvements to FP32 BDAS performance for gfx908
Significant improvements to FP32 BDAS performance for MI50 and MI60
Added substitution method for small trsm sizes with m <= 64 && n <= 64. Increases performance drastically for small batched trsm.

Known Issues

None

Assets 2

Loading

All reactions

rocBLAS-2.22.0 for ROCm 3.5.0

10 Jul 22:50

amdkila

Compare

Choose a tag to compare

Loading

rocBLAS-2.22.0 for ROCm 3.5.0

Changelist

add geam complex, geam_batched, and geam_strided_batched
add dgmm, dgmm_batched, and dgmm_strided_batched

Optimized performance

ger
- rocblas_sger, rocblas_dger,
- rocblas_sger_batched, rocblas_dger_batched
- rocblas_sger_strided_batched, rocblas_dger_strided_batched
geru
- rocblas_cgeru, rocblas_zgeru
- rocblas_cgeru_batched, rocblas_zgeru_batched
- rocblas_cgeru_strided_batched, rocblas_zgeru_strided_batched
gerc
- rocblas_cgerc, rocblas_zgerc
- rocblas_cgerc_batched, rocblas_zgerc_batched
- rocblas_cgerc_strided_batched, rocblas_zgerc_strided_batched
symv
- rocblas_ssymv, rocblas_dsymv, rocblas_csymv, rocblas_zsymv,
- rocblas_ssymv_batched, rocblas_dsymv_batched, rocblas_csymv_batched, rocblas_zsymv_batched,
- rocblas_ssymv_strided_batched, rocblas_dsymv_strided_batched, rocblas_csymv_strided_batched, rocblas_zsymv_strided_batched,
sbmv
- rocblas_ssbmv, rocblas_dsbmv,
- rocblas_ssbmv_batched, rocblas_dsbmv_batched,
- rocblas_ssbmv_strided_batched, rocblas_dsbmv_strided_batched,
spmv
- rocblas_sspmv, rocblas_dspmv,
- rocblas_sspmv_batched, rocblas_dspmv_batched,
- rocblas_sspmv_strided_batched, rocblas_dspmv_strided_batched,
improved documentation
Fix argument checking in functions to match legacy BLAS
Fixed conjugate-transpose version of geam

Known failures

Compilation for GPU Targets
- When using the install.sh script for "all" GPU Targets, which is the default, you must first set an environment variable HCC_AMDGPU_TARGET listing the GPU targets, e.g. HCC_AMDGPU_TARGET=gfx803,gfx900,gfx906,gfx908
- If building for a specific architecture(s) using the -a | --architecture flag, you should also set the environment variable HCC_AMDGPU_TARGET to match.
- Mismatching the environment variable to the -a flag architectures creates builds that may result in SEGFAULTS when running on GPUs which weren't specified.

Assets 2

Loading

All reactions

rocBLAS-2.24.0 for ROCm 3.6.0

11 Jul 00:38

saadrahim

Compare

Choose a tag to compare

Loading

rocBLAS-2.24.0 for ROCm 3.6.0

New Features

Improvements to User Guide and Design Document
L1 dot function optimized to utilize shuffle instructions ( improvements on bf16, f16, f32 data types )
L1 dot function added x dot x optimized kernel
Standardization of L1 rocblas-bench to use device pointer mode to focus on GPU memory bandwidth
Adjustments for hipcc (hip-clang) compiler as standard build compiler and Centos8 support
Added Fortran interface for all rocBLAS functions

Known Issues

None

Assets 2

Loading

All reactions

rocBLAS-2.2.0

28 Feb 22:11

amcamd

Compare

Choose a tag to compare

Loading

rocBLAS-2.2.0

Changelist:

Fix compilation of TRSV, IAMAX, IAMIN
Add TRSM test sizes
Fix false negative precision failures for f16_r gemm_ex tests
Improvements to documentation and addition of sample for i8_r/i32_r gemm_ex
Tuning for i8_r/i32_r gemm_ex for MIOpen
Add gtest ConfigurableEventListner to reduce Jenkins log file size
Initial refactorization of rocblas-bench
rocblas_dgemm NT tuning

Assets 2

Loading

All reactions

rocBLAS-2.1.0

01 Feb 02:27

bragadeesh

Compare

Choose a tag to compare

Loading

rocBLAS-2.1.0

Changelist:

Refactor rocBLAS test framework
Improved performance of i8_r/i32_r rocblas_gemm_ex on gfx906
Addition of simple trsv implementation using trsm
Improved performance of trsm
Tuning improvements for resnet50 problems
Update tuning to use new Tensile solution selection logic
rocblas_gemm_ex performance improvement when ldd == lcc and strideD == strideC
Bug fixes for IAMIN and TRSV
Add sphinx based readthedoc documentation

Assets 3

Loading

All reactions

rocBLAS-2.0.0 for ROCm 2.0

19 Dec 19:46

amcamd

Compare

Choose a tag to compare

Loading

rocBLAS-2.0.0 for ROCm 2.0

Changelist:

improved performance of fp16/fp32 rocblas_gemm_ex on gfx906
support for i8/i32 rocblas_gemm_ex
update vega-10 resnet50 tuning
refactor testing to be data driven
change gemm-ex API solution index from uint32_t to int32_t
disable gemm and gemm_ex chunking
fix gemv argument checking
add performance script for p1b1 benchmark sizes
refactor gemm code to reduce use of macros
trsm performance regression fix

Assets 3

Loading

All reactions

Previous 1 2 3 4 5 6 7 8 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.