Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_cnt: improve avg/peak accuracy for component perf measurements #9664

Merged
merged 4 commits into from
Nov 22, 2024

Conversation

kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Nov 19, 2024

Build on top of #9661 and improve the PERFORMANCE_COUNTERS infrastructure:

  • make individual performance tracking points build-time configurable
  • reduce the parallel measurements in performance overlay (focus on per-component analysis for which this overlay is primarily used)
  • implement alternate reporting to reduce logging overhead

Marking as draft as this requires #9661 to be merged first.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 19, 2024

With all patches applied, results look like this:

[1539781.868583] <inf> component: comp_copy: comp:1 0x3 perf comp_copy samples 48 period 1000 cpu avg 1237 peak 1250 3
[1539781.868605] <inf> component: comp_copy: comp:1 0x10006 perf comp_copy samples 48 period 1000 cpu avg 1752 peak 2068 65
[1539781.868626] <inf> component: comp_copy: comp:1 0x10004 perf comp_copy samples 48 period 1000 cpu avg 5588 peak 5640 478
[1539781.869616] <inf> component: comp_copy: comp:0 0x4 perf comp_copy samples 48 period 1000 cpu avg 2903 peak 2968 221
[1539781.869628] <inf> component: comp_copy: comp:0 0x6 perf comp_copy samples 48 period 1000 cpu avg 1605 peak 1892 795
[1539781.869658] <inf> component: comp_copy: comp:0 0x2 perf comp_copy samples 48 period 1000 cpu avg 1996 peak 2070 1023

With logging overhead and HDA ISRs removed, peaks are now in hundreds of DSP cycles.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I see it includes some patches from previous PR which can be rebased.

@lgirdwood
Copy link
Member

With all patches applied, results look like this:

[1539781.868583] <inf> component: comp_copy: comp:1 0x3 perf comp_copy samples 48 period 1000 cpu avg 1237 peak 1250 3
[1539781.868605] <inf> component: comp_copy: comp:1 0x10006 perf comp_copy samples 48 period 1000 cpu avg 1752 peak 2068 65
[1539781.868626] <inf> component: comp_copy: comp:1 0x10004 perf comp_copy samples 48 period 1000 cpu avg 5588 peak 5640 478
[1539781.869616] <inf> component: comp_copy: comp:0 0x4 perf comp_copy samples 48 period 1000 cpu avg 2903 peak 2968 221
[1539781.869628] <inf> component: comp_copy: comp:0 0x6 perf comp_copy samples 48 period 1000 cpu avg 1605 peak 1892 795
[1539781.869658] <inf> component: comp_copy: comp:0 0x2 perf comp_copy samples 48 period 1000 cpu avg 1996 peak 2070 1023

With logging overhead and HDA ISRs removed, peaks are now in hundreds of DSP cycles.

One thing - can we format align the comp IDs so that all text is aligned for each field. Makes it easier to parse.
Btw, why cant we print module name for humans to read ?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 19, 2024

@lgirdwood wrote:

One thing - can we format align the comp IDs so that all text is aligned for each field. Makes it easier to parse.
Btw, why cant we print module name for humans to read ?

Probably best not to touch the formating as these are already parsed by sof-test/tools/sof_perf_analyzer.py , which also adds the module names (using info from kernel log) and makes this more human readable. E.g:

# run test case, capture kernel (dmesg.txt) and FW (mtrace.txt) logs
$ sof-test/tools/sof_perf_analyzer.py --kmsg dmesg.txt mtrace.txt
     COMP_ID                   COMP_NAME CPU_AVG(MIN) CPU_AVG(AVG)  \
0  0-0x000002                   mixin.0.1        1.991        1.996   
1  0-0x000004      host-copier.0.playback        2.903        2.904   
2  0-0x000006                    gain.0.1        1.605        1.615   
3  1-0x000003                  mixout.1.1        1.230        1.235   
4  1-0x010004  alh-copier.SDW0-Playback.0        5.586        5.588   
5  1-0x010006                    gain.1.1        1.746        1.761   
 
  CPU_AVG(MAX) CPU_PEAK(MIN) CPU_PEAK(AVG) CPU_PEAK(MAX) PEAK(MAX)/AVG(AVG)  \
0        2.004         2.062         2.068         2.070              1.037   
1        2.905         2.968         3.285         4.551              1.567   
2        1.655         1.884         2.410         4.444              2.751   
3        1.237         1.250         1.355         1.742              1.411   
4        5.592         5.624         5.647         5.692              1.019   
5        1.802         2.056         2.599         4.680              2.658   
 
   MODULE_CPC  
0        2993  
1        4356  
2        2422  
3        1852  
4        8381  
5        2640  

The performance counter results are delivered via the logging
subsystem and the logging overhead can interfere with the measurements
themselves. To mitigate the impact, only a small set of performance
counters should be enabled at the same time in build.

To enable this, break the CONFIG_PERFORMANCE_COUNTERS Kconfig option
into more fine-grained options and add separate options to enable LL
task and audio component performance tracing.

Signed-off-by: Kai Vehmanen <[email protected]>
Disable CONFIG_PERFORMANCE_COUNTERS_LL_TASKS and
CONFIG_SCHEDULE_LL_STATS_LOG by default in the performance overlay. This
reduces logging overhead and makes the component level peak traces more
reliable.

The logging overhead has minimal impact to reported averages, but
can be seen in peak execution measurements.

Signed-off-by: Kai Vehmanen <[email protected]>
Implement simple alternate reporting for perf_cnt_average()
and task_perf_cnt_avg(). By calling the reporting function only
for every other measurement window, the overhead of reporting
can be filtered out from data. This mostly affects the peak
cycle reporting. For average values reporting has only minimal
impact.

Signed-off-by: Kai Vehmanen <[email protected]>
On many Intel platforms, the HD-DMA interrupts can interfere
with component level performance measurements. Add a comment
how to disable the interrupts when doing component performance
analysis.

Signed-off-by: Kai Vehmanen <[email protected]>
@kv2019i kv2019i force-pushed the 202411-perfcnt-improve branch from eabf5a9 to cc4a331 Compare November 19, 2024 13:34
@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 19, 2024

#9661 merged, rebased this PR, marking ready for review.

@kv2019i kv2019i requested a review from singalsu November 19, 2024 16:09
@@ -1,7 +1,10 @@
CONFIG_PERFORMANCE_COUNTERS=y
CONFIG_PERFORMANCE_COUNTERS_COMPONENT=y
CONFIG_PERFORMANCE_COUNTERS_LL_TASKS=y
# disable ll task level statistics to reduce logging overhead
#CONFIG_PERFORMANCE_COUNTERS_LL_TASKS=y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, perf_overlay.conf is for performance monitoring, not for performance enhancement (why would you not have enhanced performance in your default configuration), right? The name sounds potentially a bit confusing, can we rename it? Or at least add a comment at the top

Copy link
Collaborator Author

@kv2019i kv2019i Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh I'll let @singalsu comment, I think he's the only known user at the moment for this. Not sure if we have some Ci jobs somewhere that have the name hardcoded -- not sure worth the hassle TBH.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem to rename it. I don't think we have CI builds and tests with it. It was planned but our plans got changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as it is, this feature is for performance monitoring but it can have impact on peak when logging.

@lgirdwood lgirdwood merged commit 7eebf02 into thesofproject:main Nov 22, 2024
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants