Rust guest async bindings: low-overhead streams with no copies difficult to make memory safe #471

alexcrichton · 2025-03-14T14:56:28Z

Whether or not we'll change anything as a result of this, I'm not sure. In thinking more about how bindings will work in Rust I'm realizing that many possible bindings to futures/streams are all memory-unsafe in Rust and would require unsafe (which is something we want to avoid). The tl;dr; is:

"Leaking" an object in Rust is safe. I can go more into why, but the basic assumption in rust is that destructors are optional and cannot be required for memory safety.
Ideally we'd have a binding for stream<u8> that looks something along the lines of async fn write(&mut self, bytes: &[u8]) -> Result<()> which implicitly borrows bytes for the duration of the entire async function.
This was originally going to be made sound by having a destructor of the returned future which cancelled the write, but that's not possible because it's possible to leak the future and not run its destructor.

Effectively all interaction with futures/streams will have to take ownership of buffers temporarily while the operation is in progress. AFAIK that's pretty un-idiomatic in Rust and gets quite cumbersome, but there's effectively no other option for memory-safe code. It basically means that the Rust APIs will have to get opinionated very quickly which isn't a great sign for foundational APIs.

There's not really anything that can be done about this at the component model level apart from radically redesigning things which is more-or-less off the table at this point. Otherwise I mostly wanted to note this down as a consequence of an io-uring style API (which AFAIK io-uring has no low-level safe bindings in Rust as well, probably for similar reasons)

The text was updated successfully, but these errors were encountered:

lukewagner · 2025-03-14T20:12:37Z

Just to try to understand the constraint space: is there any safe way to have an async function that borrows a mutable slice?

alexcrichton · 2025-03-14T20:37:47Z

Sort of, but from the component model's perspective I don't think so.

In Rust there's nothing wrong with this signature for example:

async fn foo(a: &mut [u8]) { /* ... */ }

The requirements here are specifically:

When foo is called it returns a future with an unnameable type, let's say F
The slice a is contained within F and F is considered as closing over or borrowing a
This means that while F is alive (e.g. during a .await) it's a compile-time error to access a.
If F goes out of scope, however, then a is no longer considered borrowed and it can be accessed.

The "goes out of scope" typically means that something dropped it (or consumed it, etc). One way of going out of scope though is being leaked, and leaking notably does not run any destructor for F.

So ideally something like this:

world foo {
    import f: func(x: list<u8>);
}

would show up in Rust as:

async fn f(x: &[u8]) { /* ... */ }

The problem is that the pointer here is handed off to the component model inside of f. That's only sound with respect to the rest of Rust if when the future returned goes out of scope that the pointer is no longer being accessed. In normal Rust if the future leaks then that's fine because nothing will ever touch the pointer so it's just inert. For the component model it's a problem because the way we were going to make this sound, a destructor that cancels the component model operation, is not guaranteed to run. Effectively in Rust safety cannot rely on a destructor running, otherwise code is unsound.

The crux of this is basically that a pointer is handed off to some external system, and the safety of the operation relies on the fact that the operation can be reliably cancelled if necessary, and that can't happen in Rust 100% of the time. It'll be right 99% of the time because leaks generally don't happen, but in terms of default/quality bindings I'd want to reach 100%. To the best of my knowledge this isn't a major problem in the Rust async ecosystem since APIs like io_uring or overlapped I/O are deep within runtimes and never exposed "raw" to users (or when they are they're appropriately unsafe). That means that although futures can be leaked at any time it's sound because nothing will actually access anything the future references (as it's leaked).

The only solution I can think of is to change the bindings mode for all async APIs to requiring "owned" variants, meaning above the bindings would show up as:

async fn f(x: Vec<u8>) { /* ... */ }

The ramifications of a signature like this are that dealing with buffers is far less efficient as you have to copy something out just to give it to a function and there's no easy way to reclaim the buffer after f finishes. Personally I'd also find it pretty unfortunate to be forced to do this and penalize all async users for the niche use case that someone might accidentally leak something. That being said I also don't see an alternative...

dicej · 2025-03-14T20:45:11Z

async fn f(x: Vec) { /* ... */ }

We can also support e.g. bytes::Bytes (which I did recently for host bindings to avoid extra allocations/copies when hooking wasi:http up to Hyper), or a reusable buffer type based on e.g. Rc<RefCell<Vec<u8>>> to support buffer reuse in the common case. Not nearly as elegant as dealing with slices, of course, but there are workarounds to avoid allocations and copies in a hot-path situation.

lukewagner · 2025-03-14T20:54:19Z

Ohhhh, so just to help me bottom out here: is the critical detail that makes foo safe the fact that, if the returned F value goes out of scope by being leaked, foo will not continue running (b/c its poll is not being called), and that's why it's ok for the caller to regain ownership when F goes out of scope? If so: if you wanted to implement foo by running code in a background task that uses a, how would you (even with unsafe) implement that, or is that not possible without an extra intermediate copy (that happens in the poll and thus never happens if F is leaked and poll is never called)?

alexcrichton · 2025-03-14T21:13:40Z

We can also support e.g. bytes::Bytes

That's true yeah, but doesn't generalize to list<T>. That means it's making code generation, which is already a beast, even more complicated with more special cases. Not that I don't think we should do this, just that every problem ends up in "one more option to bindgen" gets exponentially more difficult every time we add an option.

Ohhhh, so just to help me bottom out here

Correct yeah.

if you wanted to implement foo by running code in a background task that uses a, how would you (even with unsafe) implement that, or is that not possible without an extra intermediate copy (that happens in the poll and thus never happens if F is leaked and poll is never called)?

To the best of my Rust knowledge of modern idioms, you're correct that the only sound way to implement this is to copy data. Using unsafe to share the pointer of a to some background task would fundamentally rely on a destructor running which is out of your hands once it's connected to a returned future.

Another way of putting this is that it's impossible to create a un-leakable type in Rust right now, every type may be leaked.

While this is somewhat orthogonal, I want to also clarify one thing lest anyone in the future read this thread and conclude that Rust is fundamentally broken. The Rust compiler won't randomly leak values for example, and Rust, again to the best of my knowledge, guarantees a few things around destructors:

When you destroy a value T it destroys all the aggregate fields of T as well.
When you have a value of type T on the stack and that value "goes out of scope" it's guaranteed to be destroyed.

That's why everything generally works in Rust with no leaks as everything is rooted eventually in the stack most of the time. For the 1% of the time this is a problem there are specific APIs in Rust to leak values which aren't commonly used but are often unsafe building blocks that can be accidentally mis-used. The other poster-child for leaking values where you don't opt-in to leaking values is a Rc-cycle where Rust doesn't have a cycle collector for reference counted pointers.

Pauan · 2025-03-15T00:56:36Z

That's why everything generally works in Rust with no leaks as everything is rooted eventually in the stack most of the time.

Indeed, it's actually quite hard to accidentally leak memory in Rust. You usually have to do it explicitly by using an API like std::mem::forget or ManuallyDrop.

The only time where you might accidentally leak something is with an Rc / Arc cyclic reference. And even that requires mutation in order to create the cyclic reference.

primoly · 2025-03-17T17:34:34Z

Setting stream aside for a moment and looking just at async (e.g. import f: async func(x: list<u8>) -> string), does the problem here only arise from the fact that backpressure delays the lowering of parameters? If so, would it be possible to allow a caller to reallocate the parameters in case it receives a STARTING response? That way Rust would only need to make an intermediate copy when trying to reenter a locked component instance, I believe.

alexcrichton · 2025-03-17T18:56:32Z

does the problem here only arise from the fact that backpressure delays the lowering of parameters?

Unfortunately no, a return-pointer is also passed which, if not cancelled, will be filled in asynchronously and possibly corrupt memory Rust things was un-writable by this future.

@vados-cosmonic, @yoshuawuyts, @dicej, and I talked about this in depth today as well and I also wanted to jot down notes from our discussion. The general conclusion we came to was:

There's nothing that should change about the component model to help and/or address this.
Rust async bindings will need to change to have all arguments as "owned". No slices anywhere.
Rust bindings may get better in the future if Rust gets linear types but that's pretty far-future at this point.
Streams will change to buffer-in-buffer-out-style APIs where a Vec is schlepped around in the APIs.
Binding list<T> in APIs means that, by default, the buffer will be lost when passed to imports. Mitigations for this include:
- Short-term, you have to bind the API by hand and avoid using generated bindings.
- Medium-term, we'll support "replace list<T> with MyList<T> instead of Vec<T>". Or perhaps just replacing list<u8> with a custom type.
- Long-term, maybe have some sort of power-user API where you get back a "husk" of the input arguments to an imported function call. This "husk" would have all the allocations inside of it but things like own<T> resources would all be removed.
- Long-term, generate unsafe functions parallel to the safe versions which are documented with a safety condition of "this cannot be leaked"

Ddystopia · 2025-03-19T18:41:04Z

I believe Forget as proposed by the recent RFC is viable to implement and would allow that API to be expressed without issues.

alexcrichton · 2025-04-03T15:21:40Z

I'm going to close this as I think we've figured out what to do on the Rust bindings side of things and nothing is going to change in the upstream spec as a result of this. If Rust changes, however, to get Forget or similar we can definitely update the bindings appropriately in the future.

yoshuawuyts mentioned this issue Mar 19, 2025

Forget marker trait rust-lang/rfcs#3782

Open

1 task

alexcrichton closed this as completed Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust guest async bindings: low-overhead streams with no copies difficult to make memory safe #471

Rust guest async bindings: low-overhead streams with no copies difficult to make memory safe #471

alexcrichton commented Mar 14, 2025

lukewagner commented Mar 14, 2025 •

edited

Loading

alexcrichton commented Mar 14, 2025

dicej commented Mar 14, 2025

lukewagner commented Mar 14, 2025

alexcrichton commented Mar 14, 2025

Pauan commented Mar 15, 2025

primoly commented Mar 17, 2025

alexcrichton commented Mar 17, 2025

Ddystopia commented Mar 19, 2025

alexcrichton commented Apr 3, 2025

Rust guest async bindings: low-overhead streams with no copies difficult to make memory safe #471

Rust guest async bindings: low-overhead streams with no copies difficult to make memory safe #471

Comments

alexcrichton commented Mar 14, 2025

lukewagner commented Mar 14, 2025 • edited Loading

alexcrichton commented Mar 14, 2025

dicej commented Mar 14, 2025

lukewagner commented Mar 14, 2025

alexcrichton commented Mar 14, 2025

Pauan commented Mar 15, 2025

primoly commented Mar 17, 2025

alexcrichton commented Mar 17, 2025

Ddystopia commented Mar 19, 2025

alexcrichton commented Apr 3, 2025

lukewagner commented Mar 14, 2025 •

edited

Loading