Skip to content

Releases: ROCm/rocBLAS

rocBLAS-14.3.0 for ROCm1.9

12 Oct 03:00
Compare
Choose a tag to compare

Changelist:

  • add rocblas_gemm_strided_batched_ex for mixed precision support
  • tested on ROCm1.9
  • fix chunking of A and B matrices
  • expand testing of rocblas_gemm
  • sgemm and hgemm tuning on gfx906 for Resnet50 from Tensile V4.6.0

Known failures:

  • known dgemm failures for m,n < 16

enable gfx906 support

21 Sep 17:44
8490ca9
Compare
Choose a tag to compare

A small incremental release to enable gfx906 support. To get gfx906 support, ROCm 1.9 or later must be used to build rocBLAS.

rocBLAS-14.1.2 for ROCm1.8.2

12 Sep 15:56
Compare
Choose a tag to compare

Changelist:

  • Add initial rocblas_gemm_ex for mixed precision support and foundation for future capabilities
  • use Tensile 4.5.0 for bug fixes and performance improvements
  • separate tests into quick, pre_checkin, and nightly
  • add sweep tests for gemm

rocBLAS 14.1.1 for ROCm 1.8.2

10 Aug 04:02
Compare
Choose a tag to compare

Changelist:

  • update hgemm asm_full YAML file for performance; re-train hgemm hip_lite YAML file
  • new YAML files with PreciseBoundsCheck disabled
  • update hgemm asm_full YAML file, source and VW=2 for m,n,k <= 32
  • update hgemm asm_full YAML file, source and VW=1 for m,n,k == 1
  • add strided_batched tests for hgemm
  • correct gemm test matrix initialization
  • change cmake and source files to support hip-clang
  • change from __fp16 to _Float16

rocBLAS 14.1.0 for ROCm1.8.2

29 Jun 15:33
Compare
Choose a tag to compare

Changelist:

  • partition gemm m and n dimension to avoid offset exceeding 32 bit
  • fix set_get_matrix memory leak
  • TRSM improved performance and make asynch
  • Use hip_device target for ROCm1.8.2
  • Improve gemm-strided-batched testing

rocBLAS-14.0.0 for ROCm1.7.1

15 May 23:05
Compare
Choose a tag to compare

Changelist:

  • fix Xtrsm for large size ldb
  • fix set_get_matrix for large size
  • fix Xgemm test for large size
  • additional training for ResNet sizes
  • fix dot, asum, nrm2

rocBLAS-12.3.1 for ROCm1.7.1

26 Apr 22:08
Compare
Choose a tag to compare

Changelist:

  • add gemm_kernel_name and gemm_strided_batched_kernel_name
  • Tensile training for ResNet1x1
  • add mi25 Device 6860 to vega10
  • set AMDGPU_TARGETS to gfx803;gfx900
  • fix bug in kernel2 in sum, dot, nrm2

rocBLAS-12.2.1 for ROCm 1.7.1

11 Apr 16:52
Compare
Choose a tag to compare

Changelist:

  • add function syr
  • use Tensile v4.0.1
  • add Exact sizes to Tensile yaml files

rocBLAS-0.12.1.0 for ROCm 1.7.1

12 Mar 14:41
Compare
Choose a tag to compare

Changelist

  • fix dependency installation

rocBLAS-0.12.0.0 release for ROCM 1.7.1

07 Mar 22:43
Compare
Choose a tag to compare

Same source as rocBLAS-0.12.0.0 release for ROCM 1.7.0 but for ROCM 1.7.1