Releases · ARM-software/optimized-routines

09 Jan 15:57

blapie

3752b98

v25.01 Latest

Latest

This new release provides numerous performance and codegen improvement in math/ and string/ routines. The directory structure is re-organised and simplified. The math test framework is reworked to improve testing capacity over math routines. This new release also provides several bug fixes and documentation updates.

Update MAINTAINERS
Improve subdirectory structure
- Merge pl/ into math/. All math routines can now be built into a
  single library, and the new structure reduces code duplication.
- Upgrade the test infrastructure. Intervals, thresholds and other
  test parameters are now embedded in source files, processed into
  files generated at compile-time, and finally passed to the tester
  script.
- Move "lower-quality" routines into dedicated experimental/
  directories.
Changes in config and build system
- Update minimum required versions of GCC (>= 10) and CLANG (>=5), as
  a consequence of always building SVE on AArch64.
- Updates in config options, eg. removing WANT_SIMD_TESTS and
  WANT_SVE_MATH.
Changes in math/ subdirectory
- Provide vector annotations to allow auto-vectorisation to our
  mathlib provided that -ffast-math is enabled and mathlib.h is
  included.
- Many codegen and performance improvement in vector routines. Fixing
  most regressions that occurred between GCC 13 and 14.
  Improvement in memory access, reduction of spills, and better usage
  of instruction set.
- Add vector variants for standard and non-standard routines:
  - C99: modf.
  - C23: tanpi.
  - other: sincospi.
- Fix signature of vector sincos.
- Fix tests with MPFR.
- Allow building, testing and benchmarking scalar math routines on
  macOS and Windows.
Changes in string/ subdirectory
- Fix 32-bit Arm build.
- Improve string benchmarks.
- Add support for MOPS memcpy/memmove/memset.
- Improved memset performance.
- Add new SVE memset implementation.
- Remove ILP32 support.
Changes in networking/ subdirectory
- Fix make install. Library and header might need renaming in the
  future to be consistent with other components.

Full Changelog: v24.05...v25.01

Assets 2

23 May 11:25

nsz-arm

v24.05

90f7e62

v24.05 release

Math routine changes
- Fixed AdvSIMD vector powf and log for the big-endian target.
- Fixed an undefined signed shift in the exp10 code, unlikely
  to cause problems in practice.
- AdvSIMD pow got minor optimizations.
- Now there is a build option to disable SIMD and exp10 tests
  to allow testing libcs without those symbols.
pl/ directory
- Several big-endian fixes and code cleanups.
- This continues to host many math routines with mixed quality.

Assets 2

12 Jan 13:18

nsz-arm

v24.01

864fb5e

v24.01 release

String routine changes
- Added memcpy, memmove, memset for MOPS extension.
- Optimized memcpy by improving code alignment.
- Fixed GNU property note on ILP32.
Math routine changes
- Vector math code now uses ACLE intrinsics and aarch64 only.
- Vector math code no longer builds scalar and base PCS variants.
- Optimized vector sin and cos.
- Added tgamma128, a binary128 tgammal implementation.
pl/ directory
- This continues to host many math routines with mixed quality.

Assets 2

25 Jan 12:34

nsz-arm

v23.01

56e3bf0

v23.01 release

Project changes
- All files are under a new dual license now (MIT OR Apache-2.0 WITH LLVM-exception at the election of the user).
- Added MAINTAINERS file describing who maintains the subdirectories.
- Added README.contributors files documenting contribution requirements.
- Added new pl/ subdirectory for Arm's Performance Library related routines.
String routine changes
- Added memset benchmark.
- Improved strlen and memcpy benchmarks.
- Added SVE memcpy.
- Updated arm string functions to support M-profile PACBTI.
- Merged the MTE and generic versions of strcmp, strncmp, strcpy and stpcpy into one implementation.
- Optimized memcmp, memchr-mte, memrchr, strchr-mte, strchrnul-mte, strrchr-mte, strlen, strlen-mte, strnlen, strcpy.
Math routine changes
- Fixed constants in sinf, cosf and sincosf to be compile time computed even with gcc-12 -frounding-math.
- Fixed an invalid shift in logf.
- Support floating-point exceptions in vector math routines when WANT_SIMD_EXCEPT is set.

Assets 2

18 Feb 14:31

nsz-arm

v21.02

6798b50

v21.02 release

String routine changes
- Added AArch64 ILP32 ABI support.
- Fixed SVE strnlen return value.
- Added MTE related __mtag_tag_region.
- Added MTE related __mtag_tag_zero_region.
- Minor code cleanups.

Assets 2

16 Nov 13:20

nsz-arm

v20.11

58af293

v20.11 release

New math routines
- Scalar erff and erf using fma.

Assets 2

14 Aug 12:49

nsz-arm

v20.08

0f4ae0c

v20.08 release

Bug fixes
- strcmp-mte nul check
- strncmp-mte with large size
- arm memcpy with large size (CVE-2020-6096)
String routines performance improvements
- strlen
- memmove with backward copy
Benchmarking code for strings and memory routines
- strlen

Assets 2

29 May 13:28

nsz-arm

v20.05

ef907c7

v20.05 release

New functionality (64-bit Arm)
- string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp
- string: Changes to support BTI
- string: New optimized memrchr, strnlen
Performance improvements (Neoverse N1)
- strchr/strchrnul: 21% improvement on long strings
- strrchr: 11% improvement
- strnlen: 130% improvement on long strings, 50% on short strings
Benchmark and tests
- string: New memcpy benchmark
- string: Cleanup testsuite and improve test coverage

Assets 2

28 Feb 14:34

nsz-arm

v20.02

a0ad28d

v20.02 release

New functionality

string: New strrchr and stpcpy routines
string: New Memory Tagging Extension (MTE) variants of strlen and strchr
math: New vector version of pow(double)
networking: Optimized ones' complement checksum for 32-bit and 64-bit Arm

Performance improvements

string: Improved memcpy and memmove (SIMD and non-SIMD) for 64-bit Arm
string: Improved memset for 64-bit Arm

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New functionality

Performance improvements

Releases: ARM-software/optimized-routines

v25.01

v24.05 release

v24.01 release

v23.01 release

v21.02 release

v20.11 release

v20.08 release

v20.05 release

v20.02 release

New functionality

Performance improvements