Releases: ARM-software/optimized-routines
Releases · ARM-software/optimized-routines
v25.01
This new release provides numerous performance and codegen improvement in math/ and string/ routines. The directory structure is re-organised and simplified. The math test framework is reworked to improve testing capacity over math routines. This new release also provides several bug fixes and documentation updates.
-
Update MAINTAINERS
-
Improve subdirectory structure
- Merge pl/ into math/. All math routines can now be built into a
single library, and the new structure reduces code duplication. - Upgrade the test infrastructure. Intervals, thresholds and other
test parameters are now embedded in source files, processed into
files generated at compile-time, and finally passed to the tester
script. - Move "lower-quality" routines into dedicated experimental/
directories.
- Merge pl/ into math/. All math routines can now be built into a
-
Changes in config and build system
- Update minimum required versions of GCC (>= 10) and CLANG (>=5), as
a consequence of always building SVE on AArch64. - Updates in config options, eg. removing
WANT_SIMD_TESTS
and
WANT_SVE_MATH
.
- Update minimum required versions of GCC (>= 10) and CLANG (>=5), as
-
Changes in math/ subdirectory
- Provide vector annotations to allow auto-vectorisation to our
mathlib provided that-ffast-math
is enabled andmathlib.h
is
included. - Many codegen and performance improvement in vector routines. Fixing
most regressions that occurred between GCC 13 and 14.
Improvement in memory access, reduction of spills, and better usage
of instruction set. - Add vector variants for standard and non-standard routines:
- C99: modf.
- C23: tanpi.
- other: sincospi.
- Fix signature of vector sincos.
- Fix tests with MPFR.
- Allow building, testing and benchmarking scalar math routines on
macOS and Windows.
- Provide vector annotations to allow auto-vectorisation to our
-
Changes in string/ subdirectory
- Fix 32-bit Arm build.
- Improve string benchmarks.
- Add support for MOPS memcpy/memmove/memset.
- Improved memset performance.
- Add new SVE memset implementation.
- Remove ILP32 support.
-
Changes in networking/ subdirectory
- Fix make install. Library and header might need renaming in the
future to be consistent with other components.
- Fix make install. Library and header might need renaming in the
Full Changelog: v24.05...v25.01
v24.05 release
- Math routine changes
- Fixed AdvSIMD vector powf and log for the big-endian target.
- Fixed an undefined signed shift in the exp10 code, unlikely
to cause problems in practice. - AdvSIMD pow got minor optimizations.
- Now there is a build option to disable SIMD and exp10 tests
to allow testing libcs without those symbols.
- pl/ directory
- Several big-endian fixes and code cleanups.
- This continues to host many math routines with mixed quality.
v24.01 release
- String routine changes
- Added memcpy, memmove, memset for MOPS extension.
- Optimized memcpy by improving code alignment.
- Fixed GNU property note on ILP32.
- Math routine changes
- Vector math code now uses ACLE intrinsics and aarch64 only.
- Vector math code no longer builds scalar and base PCS variants.
- Optimized vector sin and cos.
- Added tgamma128, a binary128 tgammal implementation.
- pl/ directory
- This continues to host many math routines with mixed quality.
v23.01 release
- Project changes
- All files are under a new dual license now (MIT OR Apache-2.0 WITH LLVM-exception at the election of the user).
- Added MAINTAINERS file describing who maintains the subdirectories.
- Added README.contributors files documenting contribution requirements.
- Added new pl/ subdirectory for Arm's Performance Library related routines.
- String routine changes
- Added memset benchmark.
- Improved strlen and memcpy benchmarks.
- Added SVE memcpy.
- Updated arm string functions to support M-profile PACBTI.
- Merged the MTE and generic versions of strcmp, strncmp, strcpy and stpcpy into one implementation.
- Optimized memcmp, memchr-mte, memrchr, strchr-mte, strchrnul-mte, strrchr-mte, strlen, strlen-mte, strnlen, strcpy.
- Math routine changes
- Fixed constants in sinf, cosf and sincosf to be compile time computed even with gcc-12 -frounding-math.
- Fixed an invalid shift in logf.
- Support floating-point exceptions in vector math routines when WANT_SIMD_EXCEPT is set.
v21.02 release
- String routine changes
- Added AArch64 ILP32 ABI support.
- Fixed SVE strnlen return value.
- Added MTE related __mtag_tag_region.
- Added MTE related __mtag_tag_zero_region.
- Minor code cleanups.
v20.11 release
- New math routines
- Scalar erff and erf using fma.
v20.08 release
- Bug fixes
- strcmp-mte nul check
- strncmp-mte with large size
- arm memcpy with large size (CVE-2020-6096)
- String routines performance improvements
- strlen
- memmove with backward copy
- Benchmarking code for strings and memory routines
- strlen
v20.05 release
- New functionality (64-bit Arm)
- string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp
- string: Changes to support BTI
- string: New optimized memrchr, strnlen
- Performance improvements (Neoverse N1)
- strchr/strchrnul: 21% improvement on long strings
- strrchr: 11% improvement
- strnlen: 130% improvement on long strings, 50% on short strings
- Benchmark and tests
- string: New memcpy benchmark
- string: Cleanup testsuite and improve test coverage
v20.02 release
New functionality
- string: New strrchr and stpcpy routines
- string: New Memory Tagging Extension (MTE) variants of strlen and strchr
- math: New vector version of pow(double)
- networking: Optimized ones' complement checksum for 32-bit and 64-bit Arm
Performance improvements
- string: Improved memcpy and memmove (SIMD and non-SIMD) for 64-bit Arm
- string: Improved memset for 64-bit Arm