Replies: 288 comments 20 replies
-
Hi, https://gist.github.com/dingodoppelt/802c40b1cb13c75d96f38b9604fa22df cheers, nils |
Beta Was this translation helpful? Give feedback.
-
Thanks @dingodoppelt Could you please describe the test session/environment? (ie, how many clients connected, which hardware/Operating System you were running the server on, whatever you feel noticeable) |
Beta Was this translation helpful? Give feedback.
-
@sthenos you mentioned in https://www.facebook.com/groups/507047599870191/?post_id=564455474129403&comment_id=564816464093304 that you're running the server in linux now. |
Beta Was this translation helpful? Give feedback.
-
I tested with 12 clients connected from my machine with small network buffers enabled on 64 samples buffersize. |
Beta Was this translation helpful? Give feedback.
-
Are you still interested in these data? I can run a few tests on ubuntu over the weekend. |
Beta Was this translation helpful? Give feedback.
-
One quick comment @WolfganP -- for some reason you build command line (rather than a simple EDIT Dawn strikes... Yes, it does have So if you're on anything but Windows, you'll probably want to Final edit to note: jamulus.drealm.info is running with profiling. I'll leave it up over the weekend so it should amass a fair amount of data. I'll run the |
Beta Was this translation helpful? Give feedback.
-
A different view should come from the Rock and Clasical/Folk/Choir genre servers that I've just updated to r3_5_9 with profiling.
They probably won't show much OPUS usage but this should show anything that's "weird" with server list server behaviour (although they only have about 20 registering servers, until Default). I wasn't sure what |
Beta Was this translation helpful? Give feedback.
-
@pljones yes, I added |
Beta Was this translation helpful? Give feedback.
-
Standard build:
This just changing the binary to "Jamulus", IIRC:
This was
Had a few people tonight noticing additional jitter. Not everyone... Those who noticed - myself included - had just upgraded to 3.5.9. No idea why... (I "fixed" it for the evening by upping my buffer size from 64 to 128.) 14 clients connected to the server and it's looking like this in top:
Mmm, I guess those
but it copes with only three in heavy demand. 129 and 131 seem left out. Let's see what the gprof looks like in the morning :). |
Beta Was this translation helpful? Give feedback.
-
OK, I decided to restart the central servers without profiling before I totally forget, so all the number are now in. |
Beta Was this translation helpful? Give feedback.
-
Thx for the info @pljones, good to also have some performance info for the Central Server role. Regarding the info on the audio server role, it seems it confirms the CPU impact of CServer::ProcessData and some Opus routines (I assume that as a result of the mix processing inside CServer::OnTimer), and that make sense (at least to me). Another item I think needs some attention (or verify is already optimized) is the buffering of audio blocks to avoid unnecessary memcopies. But still reading the code :-) |
Beta Was this translation helpful? Give feedback.
-
Of course @storeilly the more information the better to compare performance on diff use cases and verify common patterns of CPU usage, and direct optimization efforts. |
Beta Was this translation helpful? Give feedback.
-
Here is a short test on GCP n1-standard-2 (2 vCPUs, 7.5 GB memory) ubuntu 18.04. |
Beta Was this translation helpful? Give feedback.
-
Overnights with 1 or 2 connections... Choir meeting later so will run again after that |
Beta Was this translation helpful? Give feedback.
-
Thanks @storeilly for the files, but those last 2 indicate a period of app usage extremely short, it doesn't even register significant stats to evaluate (even the cumulative times are in 0.00). |
Beta Was this translation helpful? Give feedback.
-
I am wondering if we are taking the wrong perspective on performance with cloud services. Each cloud service has different approaches to maximize utilization of their computing and networking resources. Jamulus is unique because we care about real-time performance. Most (the ideal) cloud apps care more about lots of computing in burst and less about real-time performance (or real-time to these apps are in the 100s of milliseconds). Task switching means buffering and we know buffering means latency. As we measure the load for additional clients, we should be looking at how buffering and latency changes. |
Beta Was this translation helpful? Give feedback.
-
Folks: I hope this can help. I am willing to spend the extra for a dedicated 4 cpu for a short while if you thank that will help. |
Beta Was this translation helpful? Give feedback.
-
hi there, I have been playing around with sysbench, a tool for performance measurement and i found that cloud server performance is pretty good cpu-wise but awful for memory performance where my dedicated machine really shines. i ran this test on my home machine:
and on my cloud server:
this doesn't look too good in comparison. maybe this is the bottleneck? do you think sysbench could be a reliable tool to measure server performance instead of trial and error or are there any other tools i could try? |
Beta Was this translation helpful? Give feedback.
-
How much memory is used by a Jamulus server thread? |
Beta Was this translation helpful? Give feedback.
-
On my windows system at home a server uses only about 60 MB memory. |
Beta Was this translation helpful? Give feedback.
-
I meant Jamulus memory usage, as you've given. The test was about memory throughput, if I read it correctly. If Jamulus isn't memory constrained, then the test shown won't be representative of Jamulus performance. |
Beta Was this translation helpful? Give feedback.
-
I just wondered if that might be the issue with the cloud servers. The CPU performance is fine and doesn't really deviate from what I measure on real hardware. The only thing I could find using sysbench was the restricted performance on memory throughput in comparison to real hardware so I figured this might be another thing to look at since cloud servers die long before the CPU is used up. |
Beta Was this translation helpful? Give feedback.
-
On average, that may be true. Are you getting a reading for consistency of performance - i.e. how much the CPU performance deviates between maximum throughput and minimum? As noted above, it's that stability that Jamulus needs and which directly affects its capacity. |
Beta Was this translation helpful? Give feedback.
-
@dingodoppelt I don't know if you are on facebook, but there is a report about successfully have 53 clients connected to a Jamulus server on a 4 CPU virtual server: https://www.facebook.com/groups/619274602254947/permalink/811257479723324: "Had 53 members of a youth orchestra this evening on Jamulus (and another 15-20 listening on Zoom). Took about 90 minutes of setup so we only got through a reading of Jingle Bells at the end but it was a great first step! AWS 4 vCPU server hit ~55%." |
Beta Was this translation helpful? Give feedback.
-
@corrados : my server does this, too. but not with every client on small network buffers. I've played on servers with around 50 people, but you can never tell if everybody has small network buffers enabled. in my tests i connected every client with the same buffersize and small network buffers enabled. It only worked for me on dedicated hardware (namely WorldJam, Jazzlounge servers) |
Beta Was this translation helpful? Give feedback.
-
haven't done any testing / thorough research here yet, but just a heads up: there are also several kernel parameters for the UDP networking stack and general network parameters that could be tuned with sysctl that might have a positive effect: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt |
Beta Was this translation helpful? Give feedback.
-
@sbaier1 I am out in the Internet frontier (aka far from the small distances at Europe) and see the network behavior being a dominant contributor to latency dependent performance. I'd be interested in a discussion about what can be done to quench traffic and discard packets. These mechanisms might be a good way to improve the performance (at the expense of audio interruption, which would be happening anyway). Especially with regards to a different thread on buffer backups (they called it buffer-bloat), the only way to manage problems in the network with packet backup at some routers would be creating some code to detect backups and quench traffic. I have some musicians that will "tolerate" 20-70 ms latency rather than not have music. Actively managing the packet rate up at 40+ ms would greatly improve the experience. (Note, I am thinking that some of the buffer back-ups is the interaction between our UDP traffic and other people's TCP cross traffic.) |
Beta Was this translation helpful? Give feedback.
-
There's a new PR for multi threading: #960 |
Beta Was this translation helpful? Give feedback.
-
thanks @ann0see for pointing me to this thread. i could not read all comments here but i want to add my findings to the thread. i noticed difficulties fitting more than about 17 clients on my hetzner cloud vServer. even though i tried configurations from 2 to 16 cores. my guess is that the cpu cores on my cloud server are not as strong as the ones the current multi threading code was developed for, i see a comment mentioning amazon cloud servers there. i only tested this change with up to 21 clients. and i see much better cpu usage between cores. |
Beta Was this translation helpful? Give feedback.
-
I also started playing around with the performance of Jamulus, since I would like to host an event in April with 50+ singers. Looking good so far on a 4 vCPU core virtual server at IONOS. A problem I had with load testing seems to originate in the DDoS detection system of IONOS – they seem to shut down network access to the server for about 60 minutes if I open many Jamulus sessions in short time (all originating from the same machine). Nevertheless, I would like to share my load driver script with you, which creates n Jamulus instances, connects to a server and sends pink noise there. It is inspired by @maallyn's script above, but also wires the Jamulus inputs and the sound source automatically. (If I do it manually, I find njconnect quite convenient.) This way, it can run on a headless server. On Ubuntu, I have to first run Here's the start script, which has to be started with
We can use To stop the test, I use
The script is a bit hacky and not very elaborated, please feel free to improve it. One caveat: Jack has a hard-coded limit of 64 clients per machine (https://github.com/jackaudio/jack2/blob/develop/common/JackConstants.h). To circumvent this, one would have to build a custom version of Jack from source. (Or use more machines as load drivers.) |
Beta Was this translation helpful? Give feedback.
-
Follows from #339 (comment) for better focus of the discussion.
So, as the previous issue started to explore multi-threading on the server for better use of resources, I first run a profiling of the app on debian.
Special build with:
qmake "CONFIG+=nosound headless noupcasename debug" "QMAKE_CXXFLAGS+=-pg" "QMAKE_LFLAGS+=-pg" -config debug Jamulus.pro && make clean && make -j
Then run as below, and connecting a couple of clients for a few seconds:
./jamulus --nogui --server --fastupdate
Once disconnecting the clients I gracefully killed the server
pkill -sigterm jamulus
And finally run gprof, with the results posted below:
gprof ./jamulus > gprof.txt
https://gist.github.com/WolfganP/46094fd993906321f1336494f8a5faed
It would be interesting to see those who observed high cpu usage run test sessions and collect profiling information as well to detect bottlenecks and potential code optimizations, before embarking on multi-threading analysis that may require major rewrites.
Beta Was this translation helpful? Give feedback.
All reactions