High CPU and Low throughput #2105
-
Hi All, We are using Crossbar version 20.7.1 in production as a WAMP protocol router. We have developed Python apps that make RPC calls to systems which are also connected to the router. Until now, everything was good, but we have encountered issues since introducing file uploads. The server application makes RPC calls to the system to upload files in 128 KB chunks. We have observed that with only 20 systems uploading files, Crossbar's CPU usage reaches 100% and chokes. We also noticed a decrease in overall throughput due to the high CPU usage (uploading around total 7 to 8 GB files in 4 hours from each server). Recently, we set validate_utf8 to false, which improved performance by 20 to 30%, but the overall file upload process is still slow due to 100% CPU utilization. Other RPC are simple JSON structure but file upload RPC uploads binary ,videos and other types of large files.
Below are our crossbar settings . Any Help would be appreciated
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
You should use a recent Linux (or FreeBSD), use a CPU architecture supported as a PyPy JIT target (x86-64 or ARMv8, not RiscV yet .. sadly), use multiple cores (eg a 32-128 cores CPU), use a mainboard with sufficient main memory bandwidth, and capacity
Run a multi-core router worker setup in Crossbar.io to scale vertically to your number of CPU cores. This will achieve ~50,000 routed RPC calls per second (each routed RPC call amounting to 4 WAMP messages) and GB/s aggregate RPC payload bandwidth - per CPU core used. I've tested vertical scalability at millions of routed RPCs/s and multiple GB/s, on up to ~100 CPU cores, but I'd expect it scales vertically to 256 cores (on Linux) at least on a single machine. If you need more performance than a single machine can provide, you need a multi-node router cluster using Crossbar.io router-to-router links (r2r links).
Both TLS and WebSocket compression (which means, deflate/zlib) will increase CPU load obviously. The CPU load of TLS/zlib in general, and from Python, specifically from PyPy, is what is pushing the limits here. Crossbar.io is simply using "what is there" in terms of hardware resources and libraries then. If you need TLS, eg because you want to scale horizontally (over multiple machines), besides the raw TCP network performance, you might want a NIC with hardware TLS support from OpenSSL - and make sure such OpenSSL is then used from PyPy in Crossbar.io The most efficient WAMP framing is WAMP RawSocket over Unix Domain Socket (without TLS), and the most efficient normal WAMP serialization is WAMP-CBOR, and the absolute most efficient is zero-serialization using WAMP-Flatbuffers Hope this helps! Cheers, /Tobias |
Beta Was this translation helpful? Give feedback.
-
Yep! PyPy. Also: you seem to be running on on vSphere, and are using a 10 yrs old CPU which is EOL from Intel. Plus post the complete output from |
Beta Was this translation helpful? Give feedback.
-
Ok, I see! So may I ask:
|
Beta Was this translation helpful? Give feedback.
Ok, I see! So may I ask: