Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLEXT: fix failures and make DRC an LLEXT module by default on MTL #9116

Merged
merged 11 commits into from
Jul 16, 2024

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented May 10, 2024

Build all the code, supporting LLEXT, modular
UPDATE: changed to only build DRC as LLEXT on MTL

@lyakh
Copy link
Collaborator Author

lyakh commented May 13, 2024

It has been suggested that the CI might just work if we switch any module to llext and build a deployable layout.

@marc-hb ok, no, a very clear indication that it unfortunately doesn't "just work:" https://sof-ci.01.org/sofpr/PR9116/build4627/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=verify-kernel-boot-log so supposedly some work needs to be done...

@lyakh
Copy link
Collaborator Author

lyakh commented Jun 5, 2024

SOFCI TEST

@lyakh lyakh force-pushed the smatest-mtl branch 2 times, most recently from 46b48b1 to 0eeab10 Compare June 5, 2024 15:44
@marc-hb
Copy link
Collaborator

marc-hb commented Jun 5, 2024

https://github.com/thesofproject/sof/actions/runs/9387214836/job/25849589149?pr=9116

FAILED: zephyr/smart_amp_test_llext/smart_amp_test.llext D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\sof\sof\workspace\build-mtl\zephyr\smart_amp_test_llext && D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe  -R.xt.* D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input -oD:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext  && "C:\Program Files\CMake\bin\cmake.exe" -E true"
D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe: 'D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input': No such file

https://sof-ci.01.org/sofpr/PR9116/build5189/build also failed.

@lyakh lyakh force-pushed the smatest-mtl branch 3 times, most recently from 18b1e37 to 660d60b Compare June 7, 2024 08:11
@lyakh
Copy link
Collaborator Author

lyakh commented Jun 7, 2024

https://github.com/thesofproject/sof/actions/runs/9387214836/job/25849589149?pr=9116

FAILED: zephyr/smart_amp_test_llext/smart_amp_test.llext D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\sof\sof\workspace\build-mtl\zephyr\smart_amp_test_llext && D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe  -R.xt.* D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input -oD:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test_llext/smart_amp_test.llext  && "C:\Program Files\CMake\bin\cmake.exe" -E true"
D:\a\sof\sof\zephyr-sdk-0.16.4_windows-x86_64\zephyr-sdk-0.16.4\xtensa-intel_ace15_mtpm_zephyr-elf\bin\xtensa-intel_ace15_mtpm_zephyr-elf-strip.exe: 'D:/a/sof/sof/workspace/build-mtl/zephyr/smart_amp_test.llext.pkg_input': No such file

https://sof-ci.01.org/sofpr/PR9116/build5189/build also failed.

@marc-hb fixed that. Could you or @fredoh9 help check why that's the case? We'd need to compare intermediate results
EDIT: sorry, I meant to ask: why the windows / linux build comparison is failing now

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 7, 2024

https://github.com/thesofproject/sof/actions/runs/9418758155/job/25947397562?pr=9116

Files linux-build mtl/build-sof-staging/sof/intel/sof-ipc4-lib/mtl/community/smart_amp_test.llext and windows-build mtl/build-sof-staging/sof/intel/sof-ipc4-lib/mtl/community/smart_amp_test.llext differ

Do such files have debug symbols? If yes then they shouldn't be compared, not until they're stripped (TODO)

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 8, 2024

Do such files have debug symbols? If yes then they shouldn't be compared, not until they're stripped (TODO)

Actually, there's a better temporary solution: turn off modules when testing reproducible builds. Otherwise code in modules is not tested.

@lyakh
Copy link
Collaborator Author

lyakh commented Jun 12, 2024

Do such files have debug symbols? If yes then they shouldn't be compared, not until they're stripped (TODO)

@marc-hb makes sense, thanks! I'll try adding stripping .comment to Zephyr LLEXT cmake code

@lyakh
Copy link
Collaborator Author

lyakh commented Jun 12, 2024

Actually, there's a better temporary solution: turn off modules when testing reproducible builds. Otherwise code in modules is not tested.

@marc-hb not sure I understand - doesn't the failing test mean, that modules do get tested?

@lyakh lyakh marked this pull request as ready for review June 13, 2024 12:35
@lyakh lyakh changed the title [TEST ONLY] smart-amp-test: build as module on MTL LLEXT: build everything as modules on MTL Jun 13, 2024
@lyakh
Copy link
Collaborator Author

lyakh commented Jun 18, 2024

@wszypelt QB stuck?

@wszypelt
Copy link

@lyakh Unfortunately, I'm trying to solve this issue because more PRs are stuck. As long as I manually added it to the queue, the results should be available within an hour

@lgirdwood
Copy link
Member

@lyakh can you rebase and re-push. Thanks !

@lyakh
Copy link
Collaborator Author

lyakh commented Jun 26, 2024

@lyakh can you rebase and re-push. Thanks !

@lgirdwood It isn't just about rebasing: we're waiting for 2 things to happen: (1) QB support for LLEXT modules @wszypelt , and (2) a solution on how to resolve the failing Linux-Windows comparison @marc-hb #9116 (comment)

@lgirdwood
Copy link
Member

@lyakh can you rebase and re-push. Thanks !

@lgirdwood It isn't just about rebasing: we're waiting for 2 things to happen: (1) QB support for LLEXT modules @wszypelt , and (2) a solution on how to resolve the failing Linux-Windows comparison @marc-hb #9116 (comment)

Ok, lets disable the Windows/Linux comparison here as we know the toolchain has some opens around building shared objects/libraries.

@wszypelt is there an ETA for when internal CI could support this build target ? Fwiw, @mwasko and I were discussing today. My preference would be to have this build option testable by all CIs for best coverage.

@lyakh
Copy link
Collaborator Author

lyakh commented Jun 26, 2024

Ok, lets disable the Windows/Linux comparison here as we know the toolchain has some opens around building shared objects/libraries.

@lgirdwood @marc-hb we could diff --exclude=*.llext "for now"

@lyakh
Copy link
Collaborator Author

lyakh commented Jul 12, 2024

created #9298 as a part of this PR, also addressing Marc's latest comment. We can either merge it all with this one, or split it (again) into two to make them a bit smaller

@lyakh
Copy link
Collaborator Author

lyakh commented Jul 12, 2024

@wszypelt QB seems stuck again...

@lyakh lyakh force-pushed the smatest-mtl branch 2 times, most recently from 2cb61a8 to 79c18cb Compare July 12, 2024 15:42
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 12, 2024

We can either merge it all with this one, or split it (again) into two to make them a bit smaller

Thanks: it would be lovely to have a 1 or 2 days of daily test runs with #9298 alone for CI purposes. This is a pretty big PR and significant change!
https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html#defining-smaller-prs


/* Cannot go below 50000kcps */
if (data.kcps[i] > core_kcps - 50000)
data.kcps[i] = core_kcps - 50000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh I think this has to be cleaned up. At least a magic number definition at start of the file and preferably a Kconfig as this is hardly a constant that covers all platforms.

I'm ok to do this as a follow-up, as this PR has so many moving parts.

@kv2019i
Copy link
Collaborator

kv2019i commented Jul 15, 2024

@softwarecki @ujfalusi @marc-hb FYI, I think this is good to go now. Unless major issues raised today, I'll proceed and merge this today. This does have the upside that it extends coverage for loadable module flows in upstream testing.

Copy link
Member

@abonislawski abonislawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i please wait for @wszypelt windows full scope tests report to clarify if there is no regression with KCPS changes.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 15, 2024

FYI, I think this is good to go now.

The fix for the shlex / rimage_cmd quoting issue is in the #9298 subset but not here yet.

Please wait for @wszypelt windows full scope tests report to clarify if there is no regression with KCPS changes.

@abonislawski can you please approve #9298 in the mean time? It seems to have no KCPS change.

eddy1021 pushed a commit to eddy1021/sof that referenced this pull request Jul 15, 2024
Passing options one by one in .github/workflows/zephyr.yml was becoming
unwieldy. Reduces Windows/Linux duplication. This also makes local
testing easier; less typing and guess work.

This should also help with thesofproject#9116.

Signed-off-by: Marc Herbert <[email protected]>
lyakh added 11 commits July 16, 2024 08:27
Under windows the Python interpreter has to be called explicitly.
Without it an attempt to execute a Python script fails silently.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Suggested-by: Marc Herbert <[email protected]>
When ipc4_get_drv() fails to find a driver, it might mean, that the
driver needs to be linked dynamically. Printing an error in such a
case wrongly fails CI testing.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Use consistent TABs and spaces in src/samples/audio/Kconfig

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Maximum instance count cannot be zero, they have to be supplied by
respective modules.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Use maximum instance count from TOML when building a manifest.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
When pipelines are destroyed, component drivers' .reset() and .free()
are called. If those drivers were loaded dynamically their memory is
then unmapped. But logging takes place in a low priority task, so it
is important that no logging is done from those methods.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
So far we cannot build identical LLEXT modules under Linux and
Windows, build a monolithic firmware for this test.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Export additional symbols, required for building DRC as an LLEXT
object.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Add support for LLEXT building to drc. Since multiband DRC calls
functions from DRC, we cannot so far build it if DRC is configured as
a module. In the future it should be possible to build both as
modules and to export symbols between them.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Export missing symbols for modular DRC builds and select it as a
module on MTL and LNL. DRC isn't built by default, so we cannot
use CONFIG_LIBRARY_DEFAULT_MODULAR for it.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
If a module contains 0 as its CPC value, the consumption calculation
routine will assign a "safe" maximum value to keep the DSP running at
the maximum clock rate. This works when constructing a pipeline, but
when a pipeline is torn down, returning the maximum clock rate leads
to the clock being reduced to a small value. Fix this by detecting
such cases in pipeline termination code.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
@lyakh
Copy link
Collaborator Author

lyakh commented Jul 16, 2024

The fix for the shlex / rimage_cmd quoting issue is in the #9298 subset but not here yet.

@marc-hb that was a mishap, thanks for keeping an eye on it. Updated now. OTOH 9298 is missing the rimage change, that this PR is incorporating, I'll mark it a draft at least temporarily to prevent accidental merge

@@ -367,6 +375,9 @@ int pipeline_trigger(struct pipeline *p, struct comp_dev *host, int cmd)
{
int ret;
#if CONFIG_KCPS_DYNAMIC_CLOCK_CONTROL
/* FIXME: this must be a platform-specific parameter or a Kconfig option */
#define DSP_MIN_KCPS 50000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no point in keeping KCPS feature with such limit because it means we cannot go lower than highest available clock which is not true because there are basic topologies which we can handle on 38.4.

So why is this change required for llext module?

Copy link
Collaborator Author

@lyakh lyakh Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski what do you mean "lower than highest available clock?" Our highest available clock is hundreds of MHz, i.e. hundreds of thousands of KCPS, isn't it? So this is definitely lower than that.
This patch is needed not only for LLEXT, I think there's a bug in general in the current CONFIG_KCPS_DYNAMIC_CLOCK_CONTROL implementation - see commit description. Using a "safe" value for modules with a 0 CPC isn't implemented correctly. After I've fixed it I started getting lock ups when destroying pipelines. E.g. I had cases when the system was starting with 20000 KCPS, then a pipeline was constructed with a 0-CPC module, so the KCPS went to the maximum. Then as the pipeline was destroyed I tried to return to 20000 KCPS and then I got IPC timeouts in the driver. Introducing a minimum of 50000 (or 40000 worked too) KCPS fixed the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KCPS higher than 38400 will switch to our highest clock and DSP_MIN_KCPS is already higher than that. PRIMARY_CORE_BASE_CPS_USAGE is currently 20000 (higher than needed).
So the one thing is to fix 0CPC problem (which I thought was fixed with previous versions of this commit) but another thing is to introduce such limit like DSP_MIN_KCPS which I dont understand

Copy link
Member

@abonislawski abonislawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @lyakh and we will unblock LLEXT now and fix KCPS in separate PR.
The only requirement for llext is to pass CPC value so please make sure it works or fix in the next PR.

@kv2019i kv2019i merged commit 3051607 into thesofproject:main Jul 16, 2024
45 of 49 checks passed
#include <module/module/llext.h>
#include <rimage/sof/user/manifest.h>

#define UUID_DRC 0xda, 0xe4, 0x6e, 0xb3, 0x6f, 0x00, 0xf9, 0x47, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross is this needed with your changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's compatible (UUIDs aren't required to be registered), but yes, this should be added to the registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants