[GPU] Revert jit::gemm handling for f32 src, f16 fpmath #2467

kealan-barbieri · 2025-01-21T21:45:31Z

Description

Revert to sending cases with f32 activations, fp16 fpmath setting to ref, as these cases cause additional register pressure from f32->f16 reorder that isnt supported by f16 gemm strategies:

onednn_verbose,v1,info,oneDNN v3.7.0 (commit 80b61e36646c9d1ab64d439a5bf1ea0966c6f0d9)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:224
onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with float16, Intel DL Boost and bfloat16 support 
onednn_verbose,v1,info,gpu,runtime:OpenCL
onednn_verbose,v1,info,gpu,engine,opencl device count:1 
onednn_verbose,v1,info,gpu,engine,0,name:Intel(R) Data Center GPU Max 1100,driver_version:24.39.31294,binary_kernels:enabled
onednn_verbose,v1,info,experimental features are enabled
onednn_verbose,v1,info,use batch_normalization stats one pass is enabled
onednn_verbose,v1,info,GPU convolution v2 is disabled
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,primitive,error,gpu,jit::gemm,Insufficient registers in requested bundle,src/gpu/intel/jit/gemm/gen_gemm_kernel.cpp:959
Error: Function 'create_primitive' at (/nfs/pdx/home/skazakov/hal9000_skazakov/DNNL_JIRA/oneDNN/tests/benchdnn/dnnl_common.hpp:422) returned 'runtime_error'
Error: Function 'init_prim' at (/nfs/pdx/home/skazakov/hal9000_skazakov/DNNL_JIRA/oneDNN/tests/benchdnn/dnnl_common.hpp:475) returned '1'
Error: Function 'createit' at (/nfs/pdx/home/skazakov/hal9000_skazakov/DNNL_JIRA/oneDNN/tests/benchdnn/matmul/matmul.cpp:884) returned '1'
Error: Function 'create' at (/nfs/pdx/home/skazakov/hal9000_skazakov/DNNL_JIRA/oneDNN/tests/benchdnn/utils/task.hpp:49) returned '1'
0:UNTESTED_FAILED __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=f32:u8:f32 --stag=ab --wtag=ba --dtag=ab --bia_dt=f32 --attr-scales=wei:per_oc --attr-zero-points=wei:per_oc:u8 --attr-scratchpad=user --attr-fpmath=f16:true 16384x512:512x512

Similar cases should use --fpmath=strict:true to leverage f32 jit:gemm strategies.

Fixes # MFDNN-13045

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

Bug fixes

Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
Have you added relevant regression tests?

rjoursler · 2025-01-22T01:17:45Z

src/gpu/intel/jit/gemm/gen_gemm_kernel.cpp

-    if (fpmath_bf16
-            && (utils::one_of(Type::f32, problem_.Ta, problem_.Tb)
-                    || (problem_.Ta.isF8() || problem_.Tb.isF8()))
+    if (fpmath_bf16 && (problem_.Ta.isF8() || problem_.Tb.isF8())


The issue in the given case is that we should be matching the f32 input against [SB] and [SH], not B and H. Can we just extend the lambda inputs to add_mode_matches and drop these if statements? The add_mode_matches function already adds all matching strategy conversions.

xe: jit: gemm: revert handling for f32 src, f16 fpmath

146ea11

kealan-barbieri requested a review from a team as a code owner January 21, 2025 21:45

github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jan 21, 2025

petercad approved these changes Jan 22, 2025

View reviewed changes

rjoursler reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Revert jit::gemm handling for f32 src, f16 fpmath #2467

[GPU] Revert jit::gemm handling for f32 src, f16 fpmath #2467

kealan-barbieri commented Jan 21, 2025

rjoursler Jan 22, 2025 •

edited

Loading

[GPU] Revert jit::gemm handling for f32 src, f16 fpmath #2467

Are you sure you want to change the base?

[GPU] Revert jit::gemm handling for f32 src, f16 fpmath #2467

Conversation

kealan-barbieri commented Jan 21, 2025

Description

Checklist

General

Bug fixes

rjoursler Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

rjoursler Jan 22, 2025 •

edited

Loading