Final touches for publishing

memfault · Aug 27, 2024 · c28f069 · c28f069
1 parent 467679a
commit c28f069
Show file tree

Hide file tree

Showing 2 changed files with 64 additions and 64 deletions.
diff --git a/_data/authors.yml b/_data/authors.yml
@@ -272,6 +272,6 @@ mschulte:
     blurb: is a firmware engineer trying to firmware
 bertschiettecatte:
     name: Bert Schiettecatte
-    image: /img/author/bertschiettecatte.jpg
+    image: /img/author/bertschiettecatte.png
     linkedin: https://www.linkedin.com/in/bertschiettecatte/
     blurb: is the founder of Noisetron LLC, a software & hardware engineering and expert witness consulting practice, and the founder of Percussa, a pro-audio consumer electronics company.
diff --git a/_posts/2024-08-27-memory-debugging.md b/_posts/2024-08-27-memory-debugging.md
@@ -1,5 +1,5 @@
 ---
-title: How Memory Usage Patterns Can Derail Real-time Performance
+title: Preventing Real-Time Disruptions — Debugging an Inefficient Memory Usage Pattern
 description:
   A story about memory usage causing real-time performance issues in an audio
   system.
@@ -18,8 +18,6 @@ issues, ending with my solution and lessons learned.
 
 <!-- excerpt end -->
 
-Optional motivation to continue onwards
-
 {% include newsletter.html %}
 
 {% include toc.html %}
@@ -37,14 +35,14 @@ metal DSP platform and do not run an off-the-shelf operating system.
 Subsequently, I developed the
 [Percussa Super Signal Processor (SSP)](https://www.percussa.com/super-signal-processor/),
 a multichannel audio DSP platform, which was funded via Kickstarter for 314%.
-The SSP is used by professional studio musicians and sound designers world-wide
+The SSP is used by professional studio musicians and sound designers worldwide
 and is a eurorack module (eurorack is a modular synthesizer standard). It is
 installed into end users’ synthesizer racks or in recording studios. It offers a
-library of DSP building blocks which can be chained together to create various
-signal processing chains, for synthesizing new sounds or processing existing
+library of DSP building blocks that can be chained together to create various
+signal-processing chains for synthesizing new sounds or processing existing
 sounds.
 
-The SSP is a Linux-based platform, based on a Rockchip SoC. In addition to
+The SSP is a Linux-based platform based on a Rockchip SoC. In addition to
 developing the hardware and bringing up the platform, I also developed the
 real-time audio DSP software for the SSP. To deliver a great customer
 experience, it was important to get consistent audio DSP performance from the
@@ -68,7 +66,7 @@ user interface code, menu systems, and code to parallelize DSP code across CPU
 cores. My goal on my embedded Linux platform was to use the mainline kernel tree
 to reduce maintenance costs.
 
-## The problem
+## Audio Issue Diagnosis
 
 As I started testing and debugging my application, I kept running into issues
 where the audio output was regularly interrupted, resulting in an audible click.
@@ -78,82 +76,83 @@ locks in my code, and whether the audio callback could be pre-empted by the
 operating system, causing it to return too late and miss its real-time
 “deadline.”
 
-Eventually I decided to use a profiler to figure out what was going on and
+Eventually, I decided to use a profiler to figure out what was going on and
 turned to the excellent [Tracy profiler](https://github.com/wolfpld/tracy),
 which is open source and free. Tracy is easy to integrate with your code,
 allowing you to define different zones for tracing and profiling. The Tracy
 profiling GUI application can connect over the network with your embedded
-platform and receive trace information in real-time. This helps speed up
+platform and receive trace information in real time. This helps speed up
 development and debugging. Tracy also comes with a command-line capture tool you
 can use to capture the trace directly to disk instead of sending it over the
 network to your workstation.
 
 I typically do not enable Tracy in release builds shipped to customers and only
 use it when I’m trying to find a problem with the real-time behavior of my
 software. I use compiler switches (`-D`) to enable Tracy, as described in the
-Tracy manual, which also describes how to control the amount of data collected.
-This is important, as you do not want to generate so much tracing data that it
-starts to get in the way of determining the real problem at hand.
-
-<p align="center">
- <img width="100%" src="{% img_url memory-usage-in-audio-system/trace1.png %}" alt="Image of Tracy (trace 1)" />
-</p>
+[Tracy manual (PDF warning)](https://github.com/wolfpld/tracy/releases/latest/download/tracy.pdf),
+which also describes how to control the amount of data collected. This is
+important, as you do not want to generate so much tracing data that it starts to
+get in the way of determining the real problem at hand.
 
 With the help of Tracy, I discovered that my audio callback was occasionally
 taking too long to return. This happened often enough that it was a real
 problem.
 
-In the above screenshot, you can see an instance when the ALSA callback takes
-much longer than normal. At the bottom of the screen, you can see the different
-nested zones that I defined in my source code, with Tracy measuring the time
-spent in each. The zones allow you to understand which areas of your code are
-the most problematic and need review.
+<p align="center">
+ <img width="100%" src="{% img_url memory-usage-in-audio-system/trace1.png %}" alt="Image of Tracy (trace 1)" />
+</p>
 
-What was interesting was that the issue always seemed to be connected to a
-memory-related function call. Right above the zones at the bottom, you can see a
-series of blue dots. When hovering over them, these dots showed the same
-memory-related function repeatedly, indicating that the CPU was spending an
-unusually long time in this function. Additionally the yellow graph at the
-bottom (indicated by yellow arrows) shows memory usage constantly increasing or
-decreasing, resulting in the zig-zag pattern being plotted.
+At the top of the screen in Tracy, you can see which CPU cores the threads are
+being scheduled on (note the four “lanes” in the above screenshot, since the
+software is running on a quad-core CPU). At the bottom of the screen, you can
+see the different nested zones that I defined in my source code, with Tracy
+measuring the time spent in each. The zones allow you to understand which areas
+of your code are the most problematic and need review.
+
+You can see in the above screenshot an instance when the ALSA callback takes
+much longer than normal. What was interesting was that the issue always seemed
+to be connected to a memory-related function call. Right above the zones at the
+bottom, you can see a series of blue dots. When hovering over them, these dots
+showed the same memory-related function repeatedly, indicating that the CPU was
+spending an unusually long time in this function. Additionally, the yellow graph
+at the bottom shows memory usage constantly increasing or decreasing, resulting
+in the zig-zag pattern being plotted.
 
 However, I had paid close attention to the code in my audio callback and DSP
-worker threads, and I was fairly confident I was not doing any kind of dynamic
+worker threads and I was fairly confident I was not doing any kind of dynamic
 memory allocation in my audio or DSP threads.
 
-As I dug deeper, I noticed that my audio thread was being pre-empted. ALSA
-support was being provided by the third-party framework. Eventually I started
-examining the memory usage behavior of my application (and third-party
-framework) and discovered that the application framework I was using had its own
+As I dug deeper, I noticed that my audio thread was being pre-empted by kernel
+threads.
+
+<p align="center">
+ <img width="100%" src="{% img_url memory-usage-in-audio-system/trace2.png %}" alt="Image of Tracy (trace 2)" />
+</p>
+
+While I don't have an exact copy of the pre-emption from this instance, I have
+an example screenshot above which is very similar. In this case, you can see
+that the ALSA thread is being pre-empted with a memory-related function call
+appearing at the bottom of the screen shortly before the scheduling of kernel
+threads.
+
+> Note: Tracy can sample Linux kernel call stacks in addition to your
+> application call stacks to give an even deeper look at what the kernel is
+> doing. Refer to the Tracy manual to understand the requirements for this.
+
+Eventually, I started examining the memory usage behavior of my application and
+discovered that the third-party application framework I was using had its own
 memory management classes. Under the hood, `malloc()` was being called to
-allocate blocks of very small sizes—some as small as 4 bytes! These classes were
-used all over the code base in the framework—and thus in my application, which
-was based on the framework. I had unwittingly invited dynamic memory allocation
-in places due to the way the framework was architected.
+allocate blocks of very small sizes -- some as small as 4 bytes! These classes
+were used all over the code base in the framework -- and thus in my application,
+which was based on the framework. I had unwittingly invited dynamic memory
+allocation in places due to the way the framework was architected.
 
 Because of the frequency with which these calls happened, it resulted in memory
 fragmentation, which explained why some memory-related function calls took so
 long to return. Fragmentation can result in memory-related function calls taking
 much longer than normal and/or the Linux kernel pre-empting your thread(s) while
 it performs its cleanup.
 
-Tracy supports sampling call stacks, and it can also sample Linux kernel call
-stacks (refer to the Tracy manual to understand the requirements for this).
-
-<p align="center">
- <img width="100%" src="{% img_url memory-usage-in-audio-system/trace2.png %}" alt="Image of Tracy (trace 2)" />
-</p>
-
-At the top of the screen in Tracy, you can see which CPU cores the threads are
-being scheduled on (note the four “lanes” in the above screenshot, since the
-software is running on a quad-core CPU).
-
-Tracy can also show when kernel threads are being scheduled, as shown in grey in
-the screenshot above, which can help you understand if any of your threads are
-being pre-empted. You can see that the ALSA thread is being pre-empted, with a
-memory-related function call appearing at the bottom of the screen shortly
-before this happens.
-
 By analyzing the call stacks, combined with callbacks taking unusually long to
 return, excessive time spent in memory-related functions, and the scheduling of
 kernel threads, you can get an idea of whether your application’s memory usage
@@ -163,7 +162,7 @@ In my particular case, my application’s memory usage patterns caused my
 callbacks and other code to take too long to return, resulting in missed
 real-time deadlines and audio clicks.
 
-## The solution
+## Solution
 
 To solve this problem, I implemented my own memory allocator and re-implemented
 `malloc()` and friends to use my allocator. At a high level, my allocator
@@ -185,19 +184,20 @@ objects on startup, which are then re-used as much as possible.
 
 While continuing my efforts with profiling and tracing using Tracy, I also
 learned the importance of being aware of the memory allocation behavior of the
-tracing code (and third-party framework code!) that you integrate in your
+tracing code (and third-party framework code!) that you integrate into your
 application. Tracy has its own allocator, which is great, but one needs to be
 mindful of the tracing features enabled and the volume of data being collected
 to avoid affecting the real-time behavior of the application being traced.
 
 ## Conclusion
 
-If you are working on a real-time application—whether it is an audio processing
-application, a low-latency proprietary trading application, or anything else
-that needs to have real-time or low-latency behavior—profile your application in
-various ways and be aware of the memory usage patterns of your application. And
-don't forget to examine the behavior under the hood of ALL third-party libraries
-in your application (including tracing or profiling code or libraries). 🤓
+If you are working on a real-time application -- whether it is an audio
+processing application, a low-latency proprietary trading application, or
+anything else that needs to have real-time or low-latency behavior, profile your
+application in various ways and be aware of the memory usage patterns of your
+application. And don't forget to examine the behavior under the hood of ALL
+third-party libraries in your application (including tracing or profiling code
+or libraries). 🤓
 
 <!-- Interrupt Keep START -->