Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneDNN v3.7 release notes #2481

Open
wants to merge 27 commits into
base: rls-v3.7
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
23e00b1
doc: update relase notes
vgvozdeva Jan 22, 2025
8be4815
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
73e9276
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
f87e982
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
6a59eb1
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
e12db5a
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
c4cafcd
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
632296f
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
128ba81
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
b1b6fc4
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
56d1562
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
b74779a
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
a5dbb42
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
60a8ad7
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 23, 2025
c5c9ce4
doc: Add CPU information
vgvozdeva Jan 24, 2025
cf5db22
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
31b1170
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
7fdc7cf
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
f8305d7
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
a18e6d0
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
93a14c9
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
c4c6e26
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
358438f
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
84e671b
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
ca55afc
doc: Update RELEASE_NOTES.md
vgvozdeva Jan 24, 2025
c83a60a
doc: incorporated additional Intel GPU input
vpirogov Jan 31, 2025
1ce6a56
doc: edits for structure and clarity
vpirogov Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Performance Optimizations
Copy link
Member

@vpirogov vpirogov Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgeor255, @ShanoToni, @t4c1, @Rbiessy, could you please help with release notes content for NVIDIA backend and generic SYCL kernels?

We are primarily looking for two things: performance improvements (stuff that works faster) and new features (stuff that did not work before).


## Intel Architecture Processors
tprimak marked this conversation as resolved.
Show resolved Hide resolved
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Improved performance of convolution and matmul primitives on Intel Xeon processors with Intel AMX support (formerly Sapphire Rapids and Granite Rapids).
* Improved performance of `fp8` matmul primitives with `bf16` and `fp16` bias data type on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
* Improved performance of `int8` RNN primitive on processors with Intel AVX2 and Intel AVX-512 instruction set support.
* Improved performance of `int8` depthwise separable convolution primitive with per-channel zero points on processors with Intel AVX2 and Intel AVX-512 instruction set support.
* Improved `fp16` and `bf16` softmax performance with relaxed [accumulation mode].
* Improved performance of `int8` matmul primitive with `fp16` output data type.
* Improved performance of the following subgraphs with Graph API:
* [Gated Multi-Layer Perceptron (Gated MLP)].

[accumulation mode]: https://oneapi-src.github.io/oneDNN/dev_guide_attributes_accumulation_mode.html#doxid-dev-guide-attributes-accumulation-mode

## Intel Graphics Products
vpirogov marked this conversation as resolved.
Show resolved Hide resolved
* Introduced initial optimizations for Intel GPUs based on Xe3 architecture.
* Improved performance for Intel Arc Graphics for Intel Core Ultra processors (Series 2) (formerly Lunar Lake) and Intel Arc B-series discrete graphics (formerly Battlemage).
* Improved performance of convolution with source zero points by pre-packing compenstation.
* Improved performance of backward by data convolution with strides for large filter.
* Improved performance of the following subgraphs with Graph API:
* Scaled Dot-Product Attention (SDPA) with [implicit causal mask].
* SDPA with [`int8` or `int4` compressed key and value].
* Gated-MLP.

[implicit causal mask]: https://oneapi-src.github.io/oneDNN/dev_guide_graph_sdpa.html#doxid-dev-guide-graph-sdpa
[`int8` or `int4` compressed key and value]: https://oneapi-src.github.io/oneDNN/dev_guide_graph_sdpa_compressed_kv.html#doxid-dev-guide-graph-sdpa-compressed-kv
[Gated Multi-Layer Perceptron (Gated MLP)]: https://oneapi-src.github.io/oneDNN/dev_guide_graph_gated_mlp.html#doxid-dev-guide-graph-gated-mlp

vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
## AArch64-based Processors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jondea, @theComputeKid, could you please help summarizing AArch64 improvements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sqvid I think you have a list of our improvements?
cc: @Radu2k @Ryo-not-rio


# Functionality

## Common
* Introduced support for `select` algorithm in binary primitive. The functionality is optimized for Intel CPUs.
* Extended quantization support in matmul and reorder with grouped scales and zero-points for weights. This functionality is optimized for Intel CPUs and GPUs.
* Introduced initial support for 4-bit floating-point data types `f4_e2m1` and `f4_e3m0` in matmul and reorder, as well as `e8m0` scales data type in matmul and reorder. This functionality is available on Intel CPUs and GPUs.
* Introduced [`Select`], [`GenIndex`], and [`GreaterEqual`] operations in Graph API.

[`Select`]: https://oneapi-src.github.io/oneDNN/dev_guide_op_select.html
[`GenIndex`]: https://oneapi-src.github.io/oneDNN/dev_guide_op_genindex.html
[`GreaterEqual`]: https://oneapi-src.github.io/oneDNN/dev_guide_op_greaterequal.html

## Intel Architecture Processors
* Introduced support for `fp32` matmul with `fp16` and `bf16` weights.

## Intel Graphics Products
* Introduced stochastic rounding support for convolution, matmul and reorder based on Philox counter-based random number generator.
* Introduced support for strided memory formats in convolution.

# Usability

## Common
* With the SYCL runtime, memory objects on the CPU engine are now reference-counted and no longer need to be explicitly kept alive for the duration of the primitive execution. This aligns memory object lifetime behavior on CPU and GPU engines.
* Added Graph API examples for [Gated MLP] and [`int4` Gated MLP] patterns.

[Gated MLP]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/examples/graph/gated_mlp.cpp
[`int4` Gated MLP]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/examples/graph/gated_mlp_int4.cpp

## Intel Architecture Processors
* Improved verbose diagnostics to better identify issues during dispatching, primitive and kernel creation for Intel CPU and Intel GPU implementations.
* Enabled frame pointers support on Intel64 platforms to improve integration with profilers.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved

## Intel Processor Graphics
* Improved verbose diagnostics for Intel GPU driver compatibility issues.
* Improved support of large size tensors in convolution, matmul and reduction primitives on Intel GPUs.
* Reduced scratchpad usage for NCHW convolution on Intel GPUs.

# Validation
* Extended benchdnn with support and validation for fp8 matmul patterns for tensor tags in RNN primitive validation.
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Extended benchdnn with support for rewriting data types in the test JSON files in the graph driver.
* Extended benchdnn with support and validation for the number of partitions returned from the test JSON files.

# Deprecated Functionality
* Experimental [Graph Compiler] is deprecated and will be removed in future releases.

[Graph Compiler]: https://oneapi-src.github.io/oneDNN/v3.7/dev_guide_graph_compiler.html

# Breaking Changes
* Updated minimal supported CMake version to 3.13 (was 2.8.12).
* Updated minimal supported GCC version to 8.0 (was 4.8).
* Updated minimal supported Clang version to 11.0 (was 3.0).
vgvozdeva marked this conversation as resolved.
Show resolved Hide resolved
* Removed support for SYCL older than 2020.
* Enforced `fp32` accumulation mode in `fp16` matmul and inner product primitives on Intel Graphics products without Intel XMX cores. Previous behavir can be enabled with relaxed [accumulation mode].

# Thanks to our Contributors

This release contains contributions from the [project core team] as well as Aditya Tewari @aditew01, Alexandra Sidorova @a-sidorova, Atharva Dubey @AD2605, Deb Taylor @deb-intel, Dmitriy Ovchinnikov @inteldimitrius, Fadi Arafeh @fadara01, Hengyu Meng @airMeng, @hmaciak, John Osorio @kala855, Marek Michalowski @michalowski-arm, Michael Froelich @MichaelFroelich, Michał Górny @mgorny, Nikhil Sharma @nikhilfujitsu, Permanence AI Coder @Permanence-AI-Coder, @raistefintel, Ravi Pushkar @rpushkarr, Renato Barros Arantes @renato-arantes, Romain Biessy @Rbiessy, Ryo Suzuki @Ryo-not-rio, @Shreyas-fuj, Varad Ahirwadkar @varad-ahirwadkar, @vishwascm, and Ye Tao @taoye9. We would also like to thank everyone who asked questions and reported issues.

[project core team]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/MAINTAINERS.md