-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider backing Inko processes by OS threads #690
Comments
Somewhere in the last two years I did hack together a small PoC that replaced the scheduler with a 1:1 scheduler. At the time this resulted in a small increase in execution times for the test suite, but this was when we were still using an interpreter. This setup also didn't reuse any threads, so I suspect most of the extra time was spent just starting threads. |
Here's a simple and admittedly poorly implemented example of amortizing the thread spawn cost by reusing threads: use std::sync::mpsc::channel;
use std::sync::Mutex;
use std::thread;
use std::time::{Duration, Instant};
fn naive() {
let mut i = 0;
let mut fastest = Duration::from_secs(100);
while i < 50_000 {
let (input_send, input_rec) = channel();
let (output_send, output_rec) = channel();
input_send.send(Instant::now()).unwrap();
thread::spawn(move || {
output_send
.send(input_rec.recv().unwrap().elapsed())
.unwrap();
});
let time = output_rec.recv().unwrap();
if time < fastest {
fastest = time;
}
i += 1;
}
println!("naive: {:?}", fastest);
}
fn reused() {
let mut i = 0;
let mut fastest = Duration::from_secs(100);
let reusable = Mutex::new(Vec::with_capacity(32));
while i < 50_000 {
let (input, output) = {
let mut threads = reusable.lock().unwrap();
if let Some(res) = threads.pop() {
res
} else {
let (input_send, input_rec) = channel::<Instant>();
let (output_send, output_rec) = channel::<Duration>();
thread::spawn(move || loop {
if let Ok(t) = input_rec.recv() {
let _ = output_send.send(t.elapsed());
} else {
break;
}
});
(input_send, output_rec)
}
};
input.send(Instant::now()).unwrap();
let time = output.recv().unwrap();
reusable.lock().unwrap().push((input, output));
if time < fastest {
fastest = time;
}
i += 1;
}
println!("reused: {:?}", fastest);
}
fn main() {
naive();
reused();
} In the Running this with
The "reused" time varies a bit between 500 nsec and 1 µsec, but it highlights how easily you can reduce the spawn cost by just reusing threads. Assuming a real and accurate implementation (the above version only ever spawns a single thread and always reuses it) might need some extra bookkeeping, we'd still be looking at a 10x improvement at least. The context switch cost remains, but I'm willing to bet that for 95% of the applications out there this is a non-issue to begin with. |
Another point to consider: green threads typically come with smaller growable stacks, such that the initial amount of (virtual) memory they need is smaller. However, Inko's stack sizes are fixed to 1 MiB by default, as resizing stacks comes with its own overhead and complicates code generation (= you have to ensure the stack size check always comes first in every function). |
An argument against one thread per Inko process is a less consistent experience: running many OS threads requires tuning of various |
Another argument against OS threads in the context of FFI: Pinning an Inko process to an OS thread isn't a great approach to handling C libraries requiring to run on the same thread, but it's also not that big of a deal. We could also change the scheduler such that the main process always runs on the same thread, and not offer a generic pinning mechanism. This is easy enough to implement and sufficient for using libraries that must run on the same thread. |
I'm going to close this for the time being. As much as I prefer the use of OS threads due to the simplicity it brings, the cost is simply too great at this stage and this will likely remain the cause for years to come. |
Description
Inko's approach to concurrency is similar to that of Erlang and Go: M Inko processes are mapped onto N OS threads. For sockets we use non-blocking IO, and for files we use blocking operations coupled with a backup thread pool in case these operations take too long.
This setup is based on the general belief that M:N scheduling combined with non-blocking IO leads to better performance compared to 1:1 scheduling with blocking operations. These benefits however are debatable, highly dependant on the type of workload, and come with their own set of non-trivial trade-offs.
A benefit of green threads with an M:N scheduler is that spawning tasks is fast and efficient, such that you can spawn many of them rapidly. On paper this seems beneficial, but in practice it remains to be seen if it truly is. For example, in a typical transactional application (basically any web application), the amount of concurrency is limited not by how many or how fast you can spawn your tasks, but by the concurrency supported by the external services (e.g. a database) the transaction relies upon. This means it doesn't really matter that you're able to spawn 10 000 processes with ease, if you're still limited to running only 20 concurrently due to your database pool being limited to 20 concurrent connections.
Even if your system somehow supported unbounded/unlimited concurrency, you really don't want that in a production setting as planning around unbounded concurrency is impossible, and bound to lead to problems. In contrast, it's much easier to deal with a system that's limited to for example 32 concurrent tasks.
Even if you could somehow solve this, green threading poses additional problems such as:
There are usually two reasons one might want to avoid the typical thread-per-request approach and instead go with the above approach:
The cost of context switching only really matters in systems where we have fully isolated transactions that don't depend on a fixed size pool of sorts, i.e. tasks that are purely CPU bound. But for such workloads I suspect that 1:1 scheduling is in fact better because you don't have the cost of additional bookkeeping.
The cost of spawning threads is something one should be able to mitigate (or at least improve upon) by reusing threads: you maintain a pool of reusable threads, initially at size zero. When threads are needed, we check the pool and reuse a thread if any is present. If not, we spawn a new one. When threads finish, they enter the reusable pool for up to N seconds, after which they stop. Given a sufficiently large upper limit (e.g. 1000), the cost of spawning threads is amortized over time, with the minimal/best-case cost being the equivalent of unlocking a mutex and a
pop
from a queue of sorts.The cost of context switching also applies even when using M:N scheduling, because it's still there and we have no control over it. This can in certain scenarios make things worse, such as when a process is rescheduled only for the OS thread to be swapped out with another OS thread by the kernel. In other words, M:N scheduling doesn't solve this but rather makes it less common.
I've been thinking about this over the years, but the more I think about it, and the more challenges I encounter with the M:N scheduler, the more I think we should move to an 1:1 scheduler with the above thread reuse mechanism. The benefits are numerous:
Channel
could be simplified, as we can now just use a regular condition variable and mutex for blocking processes on channelsOf course at the language level nothing would change: processes would still be lightweight processes (because they are more lightweight compared to OS processes), and the way you use channels/etc would remain the same. You'd also still spawn processes per transactions where possible, it's just that now each process is backed by a dedicated OS thread. In other words, the use of 1:1 scheduling is just an implementation detail transparent to the language.
Related work
Issues we could close
Assuming we drop the use of green threading, the following issues could be closed due to no longer being relevant:
The text was updated successfully, but these errors were encountered: