feat: Seamless Concurrency #46

KyleSanderson · 2024-11-26T03:08:30Z

When I first started learning go, the question came up, (naturally with others, I would expect) do I just blindly call go for every invocation (100,000 times) and use a waitgroup, do I spin a magic number like 7 on a 8 thread processor, or what do I do (channels! who calls close on send, oh, one per worker? oh... channels...).

There's since been hints like GOMAXPROCS and similar added, with other more neat libraries like reading cgroup limits and setting that environment variable. This is great stuff, I want to use it for some large jobs, but I don't know what to put in for the number, and from the examples they appear quite haphazard.

If GOMAXPROCS is set I'd prefer there be some function I can call in rill to guess the magic number of workers (maybe a param can be the type of workload I think it is), and if it isn't set Rill estimate what would be a correct amount of concurrency. Obviously there's no right answer, but 2 on a 4 thread box is very different from 2 on a 64 thread box, and this is the crux of the issue.

I'd normally keep it tight and vague, but tried to explain the who/what/where/when/why of why I'd want something like this, and a gap I think can be filled easily. Uber has a library that can get this information from cgroups and similar, maybe even just repackaging that until there's a reason to fork is a way forward.

KyleSanderson · 2024-11-26T03:10:53Z

(the magic number of I/O might be 2 less than thread count on Write, but 1 on Read, and 0 on hard core number crunching)

destel · 2024-11-26T11:25:55Z

There's no universal "magic number" for concurrency levels, as the optimal value depends heavily on the nature of the task and external constraints. Let me break this down:

I/O Bound Tasks

For I/O bound operations (the primary use case for rill), the optimal concurrency level is often much higher than the number of CPUs. In practice, starting with a small concurrency level like 5 or 10 often provides good results. Some examples:

Reading from S3: concurrency=50 or higher can work well
API calls to services with rate limits (e.g., Stripe API): lower concurrency to stay within limits
Database operations: balance between throughput and database load

The settings are typically determined through trial and error, considering:

Service rate limits and quotas
Available system resources (memory, network)
Impact on other system components

For databases specifically, both concurrency and batch size need to be tuned together to find the right balance between making the application fast while not overwhelming the database.

CPU Bound Tasks

For CPU-intensive operations, the approach depends on the task size:

Large tasks (e.g., PDF to HTML conversion): setting concurrency=GOMAXPROCS is usually optimal
Small tasks (brief arithmetic operations): avoid channels altogether due to overhead. For these cases, it's more efficient to use raw slices+goroutines or other approaches

Please let me know if this answer was helpful.

KyleSanderson · 2024-11-26T19:06:36Z

Your I/O example was network, which I agree where there's bad code (no pipelining), you want more requests in-flight than cores. I do disagree that there's no reasonable rule of thumb a library couldn't provide. A big disclaimer at the header of the function would be fine.

destel · 2024-11-26T20:27:09Z

I apologize - I initially misunderstood your request. You're suggesting adding a helper function that provides sensible defaults for concurrency levels based on different workloads.

While I appreciate the suggestion, the library is intentionally designed to let users explicitly choose the concurrency level based on their specific needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Seamless Concurrency #46

feat: Seamless Concurrency #46

KyleSanderson commented Nov 26, 2024

KyleSanderson commented Nov 26, 2024

destel commented Nov 26, 2024

KyleSanderson commented Nov 26, 2024

destel commented Nov 26, 2024

feat: Seamless Concurrency #46

feat: Seamless Concurrency #46

Comments

KyleSanderson commented Nov 26, 2024

KyleSanderson commented Nov 26, 2024

destel commented Nov 26, 2024

I/O Bound Tasks

CPU Bound Tasks

KyleSanderson commented Nov 26, 2024

destel commented Nov 26, 2024