Efficient memory passing between WASM and host

Hi all,

I do apologize in advance if this topic was already discussed and documented somewhere - if that's the case, I'd be grateful for any pointers to those discussions.

I've been recently looking into some of the ways we can produce preview1-compatible WASM modules and at the same time not affecting the overall development of WASI. In our case, we run WASM runtime on customer devices, and the runtime in some cases can't be updated - so we're stuck with preview1 in the host for at least 4-5 years (and likely longer).

I've explored a few different options to stay up-to-date with the tooling (mainly, wasi-libc) and still remain compatible with the old runtimes we have in production:

1. I considered having a bunch of  ifdefs in the code; this is pretty much what @abrown  proposed here: https://github.com/WebAssembly/wasi-libc/pull/476 and I think that's a good short-term plan (as mentioned in the doc, it's just for "transition"), I don't think that's going to be maintainable given that we need to keep preview1 around for quite some time
2. Fork wasi-libc - that's another option we consider, but we'd rather avoid that given that backporting bugfixes and improvements from the upstream would become more and more difficult over time
3. Implement wasi preview2 -> wasi preview 1 adapter

Whereas all three options are still on the table, I think the 3rd one is the least intrusive one, requires very little maintenance overhead (I think it's almost an one-time effort) and enables us to completely abandon preview1 references in the tooling.

I've started working on a prototype of the adapter. So far I've prototyped (it's very buggy and limited, don't use it yet :) ) [clock](https://github.com/loganek/wasi-snapshot-preview2-to-preview1-adapter/blob/main/wsp2_to_wsp1_adapter/clock.c) and [sockets](https://github.com/loganek/wasi-snapshot-preview2-to-preview1-adapter/blob/main/wsp2_to_wsp1_adapter/wamr/socket.c) APIs (using WAMR-specific preview1-like interfaces). I was able to run a simple tcp client/server application using @dicej 's wasi-libc branch. When implementing the streams API I realized that interfaces that return a list, e.g.
```
    read: func(
        len: u64
    ) -> list<u8>;
```
don't allow users to pass a pre-allocated buffer (e.g. on the stack); instead, host is expected to call WASM's malloc to allocate a memory for the return buffer. This is a problem for many embedded applications where dynamic allocations are not recommended or even not allowed. 
In addition to that, for this to work with libc, the buffer allocated by the host must be then copied to a buffer provided as a parameter for the libc function, so we have something like (very high-level flow, missing lots of details but hopefully clearly explains the problem):
```
1. [wasm] user allocates a buffer (e.g. on the stack, char buf[1024])
2. [wasm] user calls wants to read data: recv(buf, 1024, ...other parameters)
3. [wasm] recv implementation calls host's read function
4. [host] read implementation calls WASM's malloc/realloc to allocate buffer for return value
5. [host] copies data into the newly allocated buffer
6. [host] returns a pointer back to WASM
7. [wasm] recv implementation copies data from the pointer returned from host into the user-provided buffer
8. [wasm] recv function frees the memory allocated by the host
```
So we not only spend some cycles to go from host back to WASM to call allocation function (which depending on the implementation might be slow) but also we only allocate a memory temporarily to copy the value back to a buffer already allocated by the user. This is how it works in wasi-libc and I'm not sure if other languages have similar problem - even if not, C and other languages relying on wasi-libc like C++ or Rust are probably popular enough to not neglect this problem.

I understand why list (and perhaps other data types) used as return values require allocating memory - WASM code doesn't know how big the return value is, so it's reasonable to let runtime do the allocation. However, there are some cases (like read function) where user already provides the maximum requested size, and in those cases they might want to also provide a buffer that's been already allocated.

I wonder if the problem of efficient data passing between host and WASM was discussed, and if so, what's the recommendation? I'm not sure whether this is a problem with a component model per se, or is it more a problem with a design of specific interfaces (or both)? For example, would it be possible to have a `read` function to be something like:
```
read: func(out_data: list<u8>) -> error
```
? So the host knows that the list already points to a buffer of a specific length, and it should fill it with the data? This is just one of the ideas, but I'm curious what others think. 

Please note it's not just a problem for the adapter from preview2 to preview1, and it's not only about any specific interface (I can imagine lots of different proposals follow the same pattern). If the problem is not being addressed, I think it might be a blocker for some of the embedded usecases to onboard to preview2+ (those projects would either stick to preview1 in some form, or not use WASM at all). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient memory passing between WASM and host #314

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficient memory passing between WASM and host #314

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions