-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET 9 Runtime Async Experiment #94620
Comments
FYI (while maybe unrelated as this issue is about the runtime, not the language), this (effect handlers) is how the newest way to do things like async, EH and etc. in a natural way (as for async, it doesn't require to split the APIs into async and sync parts). You may see how they use effect handlers to implement async/await in C++ in the paper section. |
I admit I'm only vaguely familiar with the effect typing literature. I've read part of Daniel Hillerström's dissertation, and looked through the Effekt language a bit. I'm not sure there's anything directly interesting to the runtime there. In particular, the type system innovations I've seen in effect handlers seem to mostly be in the compiler front-end portion. Backend implementation appears to rely on a universal suspend-resume functionality implemented by the language runtime. In this case we're interested in innovating directly in that suspend-resume primitive. And we're mostly interested in how it affects async in particular. We've found that the most pervasive effect that impacts performance is async, so we want to address it specifically. For us to analyze a more general primitive I think we would want evidence that 1) that primitive is useful at the language level, namely that a language like C# would want first-class effects, and 2) the general-purpose primitive is equivalent in performance to the one we would implement for async. If (2) is not true then we would be leaving async performance on the table to handle user-defined effects, which currently seems like a bad trade-off. |
... Midori did at least a throws/async effect primitive, mostly. They didn't generally implement an effect primitive (some notes near the bottom in Joe Duffy's blog about Midori's error model) |
In Midori we did have delegate void Action();
delegate throws void ActionThrows();
delegate async void ActionAsync();
delegate async throws void ActionThrowsAsync(); That is ... interesting but I don't think it will really impact this proposal very much. |
Why not consider intergrating |
We like C# 😄 |
@davidfowl Awesome. We're rooting for you! |
Do you think C# is worse that Rust in terms of "Memory safety and Async throw deep" 😕 ? |
Actually, you can definitely write memory unsafe code without a single line of use std::marker::PhantomData;
struct Bounded<'a, 'b: 'a, T: ?Sized>(&'a T, PhantomData<&'b ()>);
fn helper<'a, 'b, T: ?Sized>(input: &'a T, closure: impl FnOnce(&T) -> Bounded<'b, '_, T>) -> &'b T {
closure(input).0
}
fn extend<'a, 'b, T: ?Sized>(input: &'a T) -> &'b T {
helper(input, |x| Bounded(x, PhantomData))
}
fn main() {
let s = String::from("str");
let a: &'static str = extend(s.as_str());
drop(s);
println!("{a}"); // <--- use after free, without any unsafe code, while no compile error at all and memory unsafe
} While anyway here we are talking about the runtime async experiment in .NET 9, comments about C# vs Rust comparison apparently to be off-topic and should be refrained. |
Recently, I was reading about LMAX Disruptor (https://lmax-exchange.github.io/disruptor/disruptor.html), and maybe it could be used to implement an async call mechanism. The main idea would be to have some real OS Threads working as Consumers (it could also be Dotnet Tasks), one main RingBuffer acting as the async calls buffer. The RingBuffer could be formed as objects (structs could be used to allocate the data instead of only the pointer) that would hold:
Then those Consumers would keep searching for new messages on the RingBuffer, and based on the number representing the called function, it would call the respective function passing the arguments to it. There could be another RingBuffer for the results, or the same could be used adding a new information to the message. There are two possible strategies for the consumers:
For some scenarios it could be interesting to allow changing the strategy dynamically (if a period of high throughput is foreseen). For that we could add another Boolean to the message, defining the strategy that should be used here after on the Thread. One thing about the RingBuffer, as stated on the paper is the zero-allocation nature of it, because, the system reallocates the whole ring. This (as stated on the paper) helps the with CPU cache hits, because the cache is structured in Cache Lines and there is a high probability that on the next iteration (busy spin) or interruption (It won’t apply much here) the next message on the RingBuffer with the information to call the method would already be ready. I'm not experienced on compilers and cpu design (although I've made some very awful ones during College), so the implementation idea for the disruptor pattern that I've outlined maybe not be the best... and the whole idea may also be of no use. |
I don't think ring buffer is a better solution for async. You don't expect async tasks to complete in the same order as creation. If we choose another data structure instead of ring buffer, we are basically making another GC in the BCL. I don't think it would beat the .NET GC, which the state machine objects in async tasks are currently using. Zero allocation does not always mean better performance. Regarding the OS thread as consumers, .NET already has a customizable task scheduler. |
From the comments, It is not clear to me what is the playground and the products of this experiment. From my understanding is to introduce an await instruction in IL, that will initially produce almost the same code in assembly that the current implementation (compiler generated) will eventually create and that's it. Later that code can change, inlining some state methods or taking advantage of the target environment ie server 2025 or Windows 12 may add support for async |
@panost This doesn't sound correct. It sounds like you're imagining that we're starting off with a 1:1 translation of the Roslyn state machine into the runtime implementation, and we'll adjust from there. That's not true. Roslyn is limited in a lot of things it can do. For example, the only way for .NET IL to interact with exceptions is using the try/catch/finally/etc infrastructure. This requirement features very heavily in the Roslyn async state machine generation. In contrast, the runtime is bound by none of these restrictions. Exceptions need to be handled, of course, but notions of stack frames and EH blocks are much more... flexible. We expect the runtime async code to similar to the Roslyn code in observable behavior, identical if possible, but it will quite likely take a number of shortcuts in machine code that would not be possible in any IL representation. |
It is also limited in handling cancellation of concurrent tasks. I think it would be a wasted opportunity not to consider Structured Concurrency concepts in this experiment. |
@agocke Could this help DLR-based languages (like PowerShell) better manage task execution at runtime? I know that for you, PowerShell is not a .NET language and doesn't have any business value, but we are a large community that doesn't receive much technical consideration from the runtime. Today, it would be good news if that changed a bit. |
@agocke Nice! Edit: Sorry, I just noticed that you edited the first post and added some links with more details. They answer most of my questions |
Great choice to try put async model directly into runtime, and take this a further step. From my perspective C#'s async experience is the best built in experience among current languages. Will it bring better exception handling and debugging experience when the async task is directly handled by runtime? Currently, the debugging experience will significantly worse when I put code in async. For example, stack frames. And Rust boys are all over the place nowadays. 😄 |
To whoever reading the code snippet above and got confused why: P.S.: Anyone with further questions about that Rust bug above, I suggest, should conduct discussions in the issue page I linked above instead of here. |
I have been working on this problem for a while now. A scheduler that uses the latest async/await patterns to do the job instead of the current scheduler that uses old school Q logic. (thread management in C# runtime happens through a scheduler that can be replaced with a custom one containing .net 9 runtime experimental async/await bits. The solution you call for will be implemented in a scheduler) Turns out having multiple schedulers unearths all kinds of strange bugs that otherwise won't happen. Fun. Here is a another take on an async only scheduler that might turbocharge your stuff. It uses zero allocation design (c++ performance tricks) to enhance the scheduling performance (in a GC sense). It attempts to be a drop in replacement for the runtime scheduler, however it needs the runtime scheduler to do its underlying horizontal scaling. But you could replace the default scheduler with some other system, some custom interop bits maybe I donno. This way you can be totally rid of runtime scheduler (read thread management). Low (read zero in this case) GC pressure scheduler is something that might only be possible on async only mode. It uses async enumerators to do the trick, enabling many low consumption threads. What color are my threads? They are all async/await. Anything else is madness. EDIT - Just as a side note: Inside that scheduler you will find a good demonstration on the use of |
I'm still going through the doc, but 2 things jumped out at me.
The behavior change is surprising and likely breaking. Some recursive async lock implementations rely on the fact that the async local state is reverted. I know async recursive locks are fundamentally unsound as they are, but people still use them. Maybe we can get a proper async lock with these changes (or regular locks will just work)?
So, |
Fixed that for you. |
My 2c: I'm opposed to language changes where "magic" happens in a way the programmer doesn't exactly know what will happen and looses control. The way C# and .Net were designed having the async and await keywords is a good thing. As GreenThreads are too. They are THE solution to specific problems. Having used greenlets on CPython 2 it was a "no issue" being restricting to a few I/O calls... To my knowledge there are at least TWO .Net projects which could benefit from "native" runtime support: Akka.net and IronPython. In case of Akka.net the implementation itself might be easier and more performant. Done right might even enable things not possible yet - like creating an Actor using a UI-Thread-Dispatcher from a non-UI-Thread dispatched Actor... IronPython could be able to support async/await at all and thus getting more Python 3 compatible... Please contact the maintainers of those projects - maybe they have a "whishlist" to build upon... |
@minesworld if they don't care about Microsoft PowerShell, I dont think they care about Ironpython, which was abandonned by them. Java is far better |
I couldn't see an answer on your request. So maybe they are contemplating about it...
Another one to ask for a "whishlist"... the more the better as general the final solution might get.
Might be - but a WinUI3 frontend would be still in C# or C++ ... |
I like .NET too, the current async/await approach is still already a nice compromise between automation and control, even though things can possibly be improved further. However, when it comes to control, I do also like the extremely efficient and quite simple way Rust handles async/.await too. In Rust async methods return Futures which do nothing until actively polled, Polling is done at the moment of awaiting. In that way they do not automatically spawn heap allocations consuming background tasks every time (dotnet ValueTask is only cheap in cases when it finishes sync immediately, not when it suspends). The polling and scheduling in Rust is done by an executor/runtime that you as developer choose (or even write yourself if you like). From what I have seen from Java, they do pretty much that idea with their virtual threads too, with an Executor and Futures as well. But I believe they don't have any await unblock points in the code. It's really limited to spawning only if I am right. They do more scheduling work directly inside the IO tasks themselves, which is also pretty interesting actually. I actually think it's great that they are now searching for an async 2.0 in .NET. And as always, other languages and technologies can be a great inspiration too. |
Both awaits are great and there is no need to make a separate keyword. |
We're still tweaking things. The main benefit is that you can gain a lot of perf by not jumping async locals back after every method. Ideally we'd keep compat and provide some option to swap behavior and gain perf, but honestly we're still not sure what that would look like. That'll need some intensive design time.
No, this is just a runtime restriction. We'll likely push this to Roslyn and have them fix it at the compiler level by lifting out of the direct scope of the clauses. Roslyn already does this for certain constructs that are not allowed in EH blocks, or to meet requirements like zero stack depth on exit. |
There will absolutely not be an |
Definitely interested in Rust innovations. I'm still learning how the Rust system works. If polling provides some portable benefits it would be good to adapt what we can (without egregiously complexifying the model). One design difference is that I don't think we will provide a completely "bare bones" feature like Rust would. Rust has to care about things like |
IMO .net async/await is completed job. If the runtime can just be a little bit more awesome and pool/reuse that singular task malloc in There is no work to be done here. |
First for the good stuff: if green thread is actually going to be implemented as currently proposed, it would completely solve one of my biggest complaint of the current async model, which is the lack of ability to "tail call" a continuation (technically possible if custom Task types are used, but integration with existing libraries will be a nightmare). This is a big plus for me. However, I have some major worries about the currently proposed green thread.
|
@DWVoid You might be confused...
|
@FlashyDJ Actually what I was referring as 'green thread' is actually the async2 experiment. If you look at its details, it is basically a kind of "green thread" implementation with caveats. Sorry for the confusion in my wording. |
@DWVoid green threads is capturing the whole native stack at suspension point and then restoring it. |
Am I correct in thinking that this sounds like an exploration in a runtime-provided primitive for delimited continuations akin to what Java shipped in JDK21 underpinning virtual/green threads? |
Could Cancellation, ConfigureAwait context concept currently be achieved using public static class AsyncContext {
public static void PushConfigureAwait(bool capture);
public static void PushCancellationToken(CancellationToken token);
public static bool ContinueOnCapturedContext { get; }
public static CancellationToken CancellationToken { get; }
} Actually it would be nice if we could have |
Thanks everyone for your interest! We've completed the experiment and I've updated the summary with our results and future plans. Hope to have more soon. |
How does JIT approach work with .NET AOT? |
The same JIT that's used at runtime by CoreCLR is used at compile time by Native AOT and crossgen, so we would share the implementation between them. |
The doc states that the new model will support |
No conclusion: ideally it would work for all task-like types, but I can't speak to the implementation requirements for that. Note that the design is that, unless the task is yielded, the Task object is essentially removed in the code generation. So in some sense, custom task-like types will be much less important. |
Not really. Delimited continuations allow retrofitting a "conventional" calling convention with arbitrary suspend-resume functionality. We're more interested in producing a new "async" calling convention that hides all the details underneath. It wouldn't be appropriate for implementing green threads since it's a different calling convention than the conventional one, and it's not generalized so it couldn't be used to implement other coroutine-like functionality. |
I would love to hear more about the detail and findings, would it be possible to have a deep dive in an upcoming language and tools community standup? |
Update:
We've now completed the initial experiment into runtime-async. Overall, the experiment was very successful. For details, see https://github.com/dotnet/runtimelab/blob/feature/async2-experiment/docs/design/features/runtime-handled-tasks.md. We tested two possible implementations: a VM implementation and a JIT implementation. Of the two, we are more positive about the JIT implementation, both for performance and for maintenance.
Our primary conclusion is that runtime-async is at least as good as compiler-async in all the configurations that we measured. In addition, we believe that the new implementation can be fully compatible with the existing compiler-async, meaning that runtime-async can be a drop-in replacement.
We would like to graduate this experiment to a new runtime feature. However, this is a large feature which may take multiple releases to complete. In order to transparently replace the compiler-async implementation we will have to implement all the existing functionality in runtime-async.
For now we'll close out the experiment with the detailed results listed in the link above, and plan to publish more information on runtime-async planning as things become more concrete.
Intro
In .NET 8 we started an experiment in adding support for green threads to .NET. We learned a lot, but decided not to continue at the current time.
In .NET 9 we'd like to take what we learned and explore performance and usability improvements of the existing .NET
async
/Task
threading model.Background
From the C# and .NET libraries level, there are two supporting threading models: OS threads, and C#
async
/Task
. Concurrent code can access each of these models using theSystem.Threading.Thread
type or C#async
/Task.Run
, respectively. For most modern C# code we recommend usingasync
andTask
if you need blocking-free waiting or concurrency.The Experiment
The status and code for the in-progress experiment can be found here: https://github.com/dotnet/runtimelab/tree/feature/async2-experiment
An ongoing design doc is present in: https://github.com/dotnet/runtimelab/blob/feature/async2-experiment/docs/design/features/runtime-handled-tasks.md
While
async
andTask
are the newest and most-preferred option at the C# and .NET libraries level, they are not a direct part of the .NET runtime threading model.This experiment asks the question: what if they were? Rather than implement the
async
structure entirely in C# as a state machine rewrite, we are interested in exploring direct runtime integration with async methods.The characteristics we're interested in for this experiment are:
Throughput
Microbenchmarks -- how much does
await
cost?Lots of nested awaits?
Frequent suspension vs. rare suspension
Compatibility
Are the semantics similar/identical to C#?
Cost of switching
Code size
IL size
Crossgen/Native AOT code size
As we explore more we might find more questions. At the moment, we're not planning to investigate things which require a lot of additional implementation work.
The text was updated successfully, but these errors were encountered: