Improvements in latency? #201

rerdavies · 2024-09-21T22:53:16Z

rerdavies
Sep 21, 2024
Maintainer

Stable, no under-runs, with plugins using 85% of available CPU, 64x3 buffer configuration, MOTU M2 USB audio adapter!

I'm not sure when it happened, but all of a sudden I seem to be getting it.

I was doing some testing on changes made to audio handling, hoping to cause ALSA audio to get upset, I added a TooB Nam plugin AND a ToobML plugin with one of the large Proteus models to the same preset. And it ran stably with NO under-runs! At 85% CPU use, with a 64x3 buffer configuration. (Also seems fine at 32x3 and 32x4).

There is a fix in TooB NAM which prevents memory allocations in the audio thread (a bug in the underling Neural Amp Modeler library, which was fixed and pushed back upstream to Steven Atkin's projec) which allows NAM to run with less variability in each audio frame.

There were a bunch of updates to audio subsystems, and drivers in a recent Raspberry Pi OS release. I'm wondering whether Raspberry Pi OS has made changes to Raspberry Pi OS that accounts for the difference. Very reasonable things that might account for the differences: updated device drivers for non-audio devices that have full RT_PREEMPT patches applied.

At any rate, I am quite amazed by this. I have never seen audio work stations on any OS running stably with 85% plugin CPU use.

So, I'm asking users of PiPedal to give me some feedback on what kind of CPU use works for you while still not under-running, as well as what kinds of buffer configurations are working for you, when you're using the latest release of PiPedal with full Raspberry Pi OS updates applied. applied.

I'm also curious about whether I should be allowing things like 16x6 or 16x8 buffer configurations. These sorts of buffer configurations do work, and at first glance, they seem to be surprisingly stable. And I do have reason to think that 32x4 might actually be more stable than 64x3, while providing even lower latency. But I haven't yet done sufficient testing to make that an actionable point. So I'm asking.

In passing, so you know: graphics operations in buster have a dire effect on audio stability. Much more so than in buster. Just moving the cursor will cause underruns on my system even with large audio buffers. The solution: run headless, or disconnect your HDMI cable. Bookworm seems to shut down the GPU if there's no display attached, which is a very good thing for PiPedal. That may not be surprising to you, but it is quite a big change from previous versions of Raspberry Pi OS. I suspect that the problem is more than one of relative priority of GPU and audio interrupts, and make have more to do with the fact that graphics operations are going to put a heavy load on CPU L2 memory caches. The heavyweight plugins (ML, NAM, and the convolution plugins) are carefully optimised to make best use of CPU L1 and L2 memory caches. So even low priority processes can cause audio unde-runs can and will cause audio under-runs if they do big flat memory operations that invalidate the caches being used by real-time audio.

38github · 2024-11-25T21:04:10Z

38github
Nov 25, 2024

I use a RPi4 and can run two instances of NAM (one STANDARD and one LITE or FEATHER) plus IR, expander, split, EQ, delay with 64/2.

I also tested the Radxa Zero 3W with one STANDARD NAM which used around 80% CPU but got xruns when using 64/2. I have not tried increasing to 3 or 4 yet. The board gets very hot even with a sink. The WiFi on it is really bad and doesn't work well with 2.4GHz in my tests. It creates alot rxfrag errors that hogs the CPU and journald. As long as WiFi is not used it works quite well.

On a Libre La Frite I can use one FEATHER NAM and lots of additional effects at 64/2. It also does not get hot and has been very stable and most of it works off of mainline kernel.

PiPedal has on my devices been incredibly stable while MODEP used a lot of CPU that caused xruns even on a RPi4 and also Rpi5 I think.

0 replies

rerdavies · 2024-11-25T22:56:00Z

rerdavies
Nov 25, 2024
Maintainer Author

@38github:

Buffer size

FYI, I think 64x2 is not a good choice of butter size. 32x4 will definitely give you fewer underruns, with equivalent or better latency . I do know that I did release a version of PiPedal that defaulted to 64x2 buffers; but it quickly became apparent to me that this was a very bad choice. PiPedal currently defaults to 64x3 buffers, but I have suspected for quite some time that it should actually be defaulting to 32x4. I'm pretty sure this is true, even on very lightweight hardware. Unfortunately, I've been chasing higher-priority issues for a while, so I haven't had the luxury of being able to play as much as I should before making a fairly risky change. There's even some reason to think that 16x5, or 16x6 buffer configurations might be a good idea. I still don't have a firm understanding of how audio buffer configuration works on Linux. There seems to be lots of lore, and not a whole lot of detail. My current best understanding is that buffer size primarily affects PiPedal's software rendering; and the number of buffers affects how much breathing room ALSA has to keep the hardware fed. And that under the covers, ALSA is mostly chasing buffer pointers for USB (or I2C), and doesn't really care what the buffer sizes are, just how many bytes of data it has to play with. In point of fact, the ALSA driver doesn't even get told what the buffer size is, just the value of SIZE x NUMBER OF BUFFERS. There is deep lore that says that buffers should be a multiple of 48 bytes for USB audio devices. This may have been true for USB 1.0 devices; but I Think it is no longer true for USB 2.0+ devices, that have a more efficient way to transfer bulk data. So I sincerely believe that piece of lore should be retired, except for exceptionally ancient hardware. So I think... 32x4 is much better because PiPedal needs one buffer in which to process input, which gives the OS up to 32x3=48 samples of data to feed the hardware with. At any given time, ALSA is feeding the hardware with some portion of that buffered data; but it can release at least two of the three buffers back to Pipedal so that pipedal can start filling them again. So 1 buffer for pipedal; 1 buffer to feed the hardware; and 2 buffers to keep things running smoothly. In the 64x2 case, pipedal needs one buffer to fill; but it can't get access to the next buffer until the OS has finished transferring the last byte of the other buffer to hardware So 1 buffer for Pipedal to fill; and some potentially very small lead time between the time that the hardware transfer completes, and the time that PiPedal gets to start filling a new buffer.A disaster waiting to happen! So for the 64x2 case, there's no spare buffer. Pipedal has to process the entire buffer in the interval between when the hardware releases the end of the previous buffer, and starts requesting data for the start of the next buffer. (Omitted for the sake of simplicity: input and output each get 32x4 buffers, so the same general argument holds for input buffers if you were just reading input data; in actual fact, input and output transfers are locked together, so it's not clear how many of the input buffers actually get used).

Problems with Wi-Fi

One of the nice things about Pi's 4 and above: that WiFi and USB run on separate buses. On older PIs, the WiFi device appears as a USB device, and shares an internal USB bus with USB audio. If your troublesome devices have USB 2.0 AND USB 3.0 connectors, you might want to experiment with using either the USB 2.0 or USB 3.0 ports, which may get their own dedicated buses and controllers. On my Pi4, I take great care to ensure that my SSD drive goes on a USB 3.0 port, and my USB audio device goes on the USB 2.0 bus (so that it doesn't share an internal bus with the SSD.

…

On Mon, Nov 25, 2024, 16:04 38github ***@***.***> wrote: I use a RPi4 and can run two instances of NAM (one STANDARD and one LITE or FEATHER) plus IR, expander, split, EQ, delay with 64/2. I also tested the Radxa Zero 3W with one STANDARD NAM which used around 80% CPU but got xruns when using 64/2. I have not tried increasing to 3 or 4 yet. The board gets very hot even with a sink. The WiFi on it is really bad and doesn't work well with 2.4GHz in my tests. It creates alot rxfrag errors that hogs the CPU and journald. As long as WiFi is not used it works quite well. On a Libre La Frite I can use one FEATHER NAM and lots of additional effects at 64/2. It also does not get hot and has been very stable and most of it works off of mainline kernel. PiPedal has on my devices been incredibly stable while MODEP used a lot of CPU that caused xruns even on a RPi4 and also Rpi5 I think. — Reply to this email directly, view it on GitHub <#201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACXK2DHGSL3GZ76WTBLUEXD2COGGBAVCNFSM6AAAAABOT7ARAOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMZXG4YDMMA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

BorisSutin · 2024-11-27T14:35:50Z

BorisSutin
Nov 27, 2024

I use 32x4 on pi5 with a hardware codec and it is more stable than 64x2. The difference in latency is minimal. If you use 16, the sound with neural plugins starts to disappear completely. Apparently the network window can't be larger than the buffer.

0 replies

briandsc · 2025-02-28T14:51:11Z

briandsc
Feb 28, 2025

Sorry to bring a thread back from the dead but looking into this stuff and not understanding the buffer. I use my motu mostly at 48000 sample rate and 64 sample buffer. how does that equate with the buffers you guys are talking about like 64x2 and 32x4?

0 replies

SkyhawkBS · 2025-03-01T05:41:45Z

SkyhawkBS
Mar 1, 2025

If you are not a programmer, then it will be difficult for you to understand the operation of the aduio buffer. It always consists of at least two parts . In one part there is continuous recording with the second part there is processing and subsequent data transfer. In Linux, unlike windows (WASAPI) and Mac OS (Core Audio), you can specify the number of such n-periods by increasing them to 4 or more. This will increase stability and reduce the risk of XRUNS, but it will also have an impact on latency. For this reason, they always try to find a compromise between speed and stability.

0 replies

rerdavies · 2025-03-03T18:05:31Z

rerdavies
Mar 3, 2025
Maintainer Author

As a programmer, it's difficult to understand the operation of the audio buffers, too! :-)

Assuming you have selected 64x4 buffers for the sake of simplicity...

A sample "frame" is one 32-bit floating point value for a mono signal, 2 32-bit floating point values for a stereo signal, or N floaing point values for an N-channel audio signal.

PiPedal reads 64 sample frames at at time from the audio device input, processes all 64 of them in one go and writes the processed buffer back to the audio adapter. So that's what the buffer size is: how many frames at a time that PiPedal processes. So the "64x" part affects only PiPedal.

As far as the actual device itself is concerned, on Linux it actually has one buffer consisting of 64x4=256 sample frames. (Actually one buffer for input, one buffer for output, each of 64x4=256 sample frames). And then the hardware chases the available data as fast as it can in ways that are not entirely well understood, to be perfectly honest.

The user interface could, in fact, allow users to specify the device's buffer size using in frames. (259 sample frames for example). Although there's good reason think that some devices would perform badly if you did. Specifying it the way PiPedal does is fairly widely used convention on both Linux and Windows systems audio systems. And there IS actually good reason to make sure that the audio device buffers is an integer multiple of the size of PiPedal's buffer.

Smaller buffers are generally better -- to a point. There is always a minimum of one buffers' worth of delay when processing. So smaller buffers will generally have lower audio latency, all things being equal. However, there is a certain amount of fixed system overhead for processing each buffer. The amount of overhead is not that significant when running with 64-sample buffers; but it would require (probably) about 5% extra system overhead when using 32-sample buffers, and about 35%(?) overhead when using 8 sample buffers.

And more buffers increases latency, but reduces the probability of audio under-runs, where the system can't feed the hardware fast enough.

Unfortunately, there's a fair bit of hidden buffering going on in the overall system. And USB audio hardware introduces additional buffers and delays. So all of these are general rather than exact principles.

At the time PiPedal was first written, there was a fair bit of deep lore that claimed that USB audio adapters performed much better if their buffers were multiples of 48 bytes (sizeof a data frame on USB 1.0). So that's why PiPedal makes sure that you can always select 3 buffers if you want to. I don't actually think that's true anymore on USB 2.0+ devices. But, for historical reasons, by default, Pipedal uses 64x3 buffer configuration. The best value actually depends on what audio device you're using, how heavily you're loading the processor, and what kinds of plugins you're using, and whether the host system is headless or not.

My current thinking: 32 sample buffers are almost always better than 64 sample buffers; 16-sample buffers require too much overhead. And, in retrospect, I think there might be value in allowing buffer configurations like 32x6, which PiPedal does not currently allow.

The original posting was meant to float the idea that selecting 32x4 buffers is actually significantly better in all respects than using the default. And as an exploratory poke to see if I shold consider changing the default configuration. (I have not).

2 replies

SkyhawkBS Mar 3, 2025

yes, it's worth it

briandsc Mar 3, 2025

i see ok. so the configuration of the pipedal that you set is telling the audio interface how its running then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements in latency? #201

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Improvements in latency? #201

rerdavies Sep 21, 2024 Maintainer

Replies: 6 comments · 2 replies

38github Nov 25, 2024

rerdavies Nov 25, 2024 Maintainer Author

BorisSutin Nov 27, 2024

briandsc Feb 28, 2025

SkyhawkBS Mar 1, 2025

rerdavies Mar 3, 2025 Maintainer Author

SkyhawkBS Mar 3, 2025

briandsc Mar 3, 2025

rerdavies
Sep 21, 2024
Maintainer

Replies: 6 comments 2 replies

38github
Nov 25, 2024

rerdavies
Nov 25, 2024
Maintainer Author

BorisSutin
Nov 27, 2024

briandsc
Feb 28, 2025

SkyhawkBS
Mar 1, 2025

rerdavies
Mar 3, 2025
Maintainer Author