Skip to content

waveshaper: Avoid allocations while resampling #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 18, 2023

Conversation

uklotzde
Copy link
Contributor

Fixes #143.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             5063057 (+0.232196%)
  L1 Accesses:              7559138 (+0.208008%)
  L2 Accesses:                54122 (+0.024026%)
  RAM Accesses:               61585 (+0.042236%)
  Estimated Cycles:         9985223 (+0.167186%)

bench_sine
  Instructions:            78716964 (+0.234685%)
  L1 Accesses:            115623207 (+0.268041%)
  L2 Accesses:               291029 (-5.125639%)
  RAM Accesses:               62681 (+0.033514%)
  Estimated Cycles:       119272187 (+0.194226%)

bench_sine_gain
  Instructions:            83527005 (+0.325033%)
  L1 Accesses:            122886134 (+0.365048%)
  L2 Accesses:               308987 (-4.551745%)
  RAM Accesses:               62835 (+0.030247%)
  Estimated Cycles:       126630294 (+0.296185%)

bench_sine_gain_delay
  Instructions:           157632228 (+0.226942%)
  L1 Accesses:            224092252 (+0.261607%)
  L2 Accesses:               724083 (-1.844409%)
  RAM Accesses:               64518 (+0.023255%)
  Estimated Cycles:       229970797 (+0.225408%)

bench_buffer_src
  Instructions:            18613888 (+0.937175%)
  L1 Accesses:             26917457 (+1.053067%)
  L2 Accesses:                93433 (-6.894731%)
  RAM Accesses:               96123 (+0.022893%)
  Estimated Cycles:        30748927 (+0.808727%)

bench_buffer_src_delay
  Instructions:            91698570 (+0.283399%)
  L1 Accesses:            126801446 (+0.317627%)
  L2 Accesses:               219921 (+5.257087%)
  RAM Accesses:               96311 (+0.017654%)
  Estimated Cycles:       131271936 (+0.349345%)

bench_buffer_src_iir
  Instructions:            42590408 (+3.444808%)
  L1 Accesses:             61836734 (+3.076097%)
  L2 Accesses:                99856 (-11.30219%)
  RAM Accesses:               96220 (+0.022869%)
  Estimated Cycles:        65703714 (+2.788657%)

bench_buffer_src_biquad
  Instructions:            39283874 (+1.336580%)
  L1 Accesses:             55241000 (+1.538339%)
  L2 Accesses:               196774 (-3.816563%)
  RAM Accesses:               96327 (+0.021805%)
  Estimated Cycles:        59596315 (+1.358240%)

bench_stereo_positional
  Instructions:            47271739 (+3.236776%)
  L1 Accesses:             69839904 (+3.432274%)
  L2 Accesses:               779452 (+8.991705%)
  RAM Accesses:               96492 (+0.023842%)
  Estimated Cycles:        77114384 (+3.544672%)

bench_stereo_panning_automation
  Instructions:            32791310 (+0.785521%)
  L1 Accesses:             48782168 (+0.921552%)
  L2 Accesses:               169025 (-18.87566%)
  RAM Accesses:               96248 (+0.027021%)
  Estimated Cycles:        52995973 (+0.473433%)

bench_analyser_node
  Instructions:            39165878 (+0.434069%)
  L1 Accesses:             54834449 (+0.511961%)
  L2 Accesses:               208391 (-5.544728%)
  RAM Accesses:               96821 (+0.027894%)
  Estimated Cycles:        59265139 (+0.371034%)


Copy link
Owner

@orottier orottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for you contribution!

I need a bit more time to review and verify it in full, but I left a few first remarks. Could you have a look?

If you wish, you could add a benchmark to benches/my_benchmark.rs to see if it actually outperforms the previous version (it will run in the CI and post a comment on your PR).

@uklotzde
Copy link
Contributor Author

If you wish, you could add a benchmark to benches/my_benchmark.rs to see if it actually outperforms the previous version (it will run in the CI and post a comment on your PR).

Benchmarking is not helpful here. The allocation might only cause issues very rarely but in fatal ways. Same applies to mutexes. Lock-free code is often less performant on average than locking.

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             5063063 (+0.232196%)
  L1 Accesses:              7559159 (+0.208247%)
  L2 Accesses:                54111 (+0.011090%)
  RAM Accesses:               61585 (+0.024363%)
  Estimated Cycles:         9985189 (+0.163147%)

bench_sine
  Instructions:            78716970 (+0.234677%)
  L1 Accesses:            115623937 (+0.256801%)
  L2 Accesses:               290303 (-0.956985%)
  RAM Accesses:               62687 (+0.036704%)
  Estimated Cycles:       119269497 (+0.237794%)

bench_sine_gain
  Instructions:            83527015 (+0.325036%)
  L1 Accesses:            122870414 (+0.338513%)
  L2 Accesses:               324714 (+5.763142%)
  RAM Accesses:               62842 (+0.039798%)
  Estimated Cycles:       126693454 (+0.399299%)

bench_sine_gain_delay
  Instructions:           157632304 (+0.226945%)
  L1 Accesses:            224043689 (+0.248001%)
  L2 Accesses:               772720 (+2.228814%)
  RAM Accesses:               64524 (+0.032557%)
  Estimated Cycles:       230165629 (+0.278498%)

bench_buffer_src
  Instructions:            18613906 (+0.937261%)
  L1 Accesses:             26916452 (+1.029204%)
  L2 Accesses:                94470 (-0.630069%)
  RAM Accesses:               96114 (+0.018731%)
  Estimated Cycles:        30752792 (+0.891829%)

bench_buffer_src_delay
  Instructions:            91698592 (+0.283438%)
  L1 Accesses:            126817169 (+0.319560%)
  L2 Accesses:               204233 (+4.367178%)
  RAM Accesses:               96306 (+0.014539%)
  Estimated Cycles:       131209044 (+0.341985%)

bench_buffer_src_iir
  Instructions:            42590424 (+3.444786%)
  L1 Accesses:             61828948 (+3.040860%)
  L2 Accesses:               107666 (+8.044155%)
  RAM Accesses:               96218 (+0.018711%)
  Estimated Cycles:        65734908 (+2.920573%)

bench_buffer_src_biquad
  Instructions:            39283866 (+1.336538%)
  L1 Accesses:             55252702 (+1.505409%)
  L2 Accesses:               185080 (+5.506784%)
  RAM Accesses:               96319 (+0.014537%)
  Estimated Cycles:        59549267 (+1.479580%)

bench_stereo_positional
  Instructions:            47271739 (+3.236754%)
  L1 Accesses:             69842817 (+3.464888%)
  L2 Accesses:               776544 (+5.849398%)
  RAM Accesses:               96486 (+0.013475%)
  Estimated Cycles:        77102547 (+3.425892%)

bench_stereo_panning_automation
  Instructions:            32791352 (+0.785724%)
  L1 Accesses:             48768116 (+0.826245%)
  L2 Accesses:               183131 (+3.715219%)
  RAM Accesses:               96244 (+0.019745%)
  Estimated Cycles:        53052311 (+0.823097%)

bench_analyser_node
  Instructions:            39165877 (+0.434074%)
  L1 Accesses:             54839178 (+0.487415%)
  L2 Accesses:               203669 (+0.534588%)
  RAM Accesses:               96816 (+0.014463%)
  Estimated Cycles:        59246083 (+0.461054%)


@orottier orottier merged commit 3db861a into orottier:main Jun 18, 2023
@orottier
Copy link
Owner

Lock-free code is often less performant on average than locking.

Fair point! Although I'm reasonably sure process_into_buffer is both lock-free and more performant :)

I have merged your updated code. Thanks again

@uklotzde uklotzde deleted the waveshaper branch June 21, 2023 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use rubato realtime features
2 participants