-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Math: IIR DF1: Optimizations for HiFi4 and HiFi5 #9684
Conversation
This patch adds iir_df1_hifi4.c that is a modified version of iir_df1_hifi3.c. The IIR calculation uses 32x32 dual MAC. The IIR delay lines update is improved with delay shift, round and pack instruction. The iir->delay address must be aligned 64 bits / 8 bytes due to use of fastest non-aligning 64 bits load/store. The updated version saves in sof-testbench4 run for MTL build (scripts/sof-testbench-helper.sh -x -m eqiir) 0.8 MCPS, from 10.6 to 9.8 MCPS for a 10th order filter. In real MTL device with 2nd order high-pass filter the saving is 0.4 MCPS, from 7.8 to 7.4 MCPS. Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds iir_df1_hifi5.c that is a modified version of iir_df1_hifi4.c. The coefficients and data load is 128 bits when possible. The data load is fastest non-aligned, so the iir->delay address needs to be 128 bits / 16 bytes aligned. The updated version saves in sof-testbench4 run 2.1 MCPS, from 10.4 to 8.3 MCPS for used 10th order filter. The used test run command for HiFi5 build of sof-testench4 was "scripts/sof-testbench-helper.sh -x -m eqiir". Signed-off-by: Seppo Ingalsuo <[email protected]>
The set to AE_ZALIGN64() is needed only for aligning writes, not reads. The IIR coefficients are only read in function iir_df1() so this is not needed. Ref: HiFi3 DSP User's Guide, page 35, aligning stores. Signed-off-by: Seppo Ingalsuo <[email protected]>
7b3f824
to
e2bf878
Compare
in = x; | ||
for (i = 0; i < nseries; i++) { | ||
/* Load data */ | ||
AE_LA32X2_IP(delay_y2y1, data_r_align, delay_r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just noticed that the HiFi3 version required 64 bit aligned data while this version doesn't. I changed this to aligned load/store too since I wasn't sure about 128 bit align for HiFi5 version. So this might be too cautious.
But I think the largest saving for IIR can be achieved with a new stereo data function with common coefficients for L and R that is the most common use case today. E.g. the EQ component would check for identical coefficients and stereo channel count and then select other processing core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - can the hifi4 parts be tested in CI today or do we need a Kconfig/topology update to test ?
There's good coverage for hifi4 IIR in CI at least for against a total mess like fw crash or bad audio quality. It's part of many alsabat sine quality checks and testbench run with chirp. I've tested this myself with process_test('eqiir', 32, 32, 48000, 1, 1, 'xt-run') for both hifi4 and hifi5. All the objective quality measurement look similar as before though the change is not bit exact. The MAC operation that I now use is with asymmetric rounding since symmetrical rounding from previous version isn't available for those. But my feel is that both are as good but with bit different pros and cons (minor linearity, offset difference). |
SOFCI TEST |
sof-docs fail and Intel LNL fails all known and tracked in https://github.com/thesofproject/sof/issues?q=is%3Aissue+is%3Aopen+label%3A%22Known+PR+Failures%22+ |
No description provided.