Skip to content

Implementing dlopen in the component model #401

Open
@alexcrichton

Description

@alexcrichton

At the BA summit this past weekend I discussed with a few folks about what it might look like to implement dlopen from C in the component model. What follows is a rough sketch about how this might be possible which is intended to capture the conversations that happened. At this time I don't believe anyone's lined up to work on this, but nevertheless I wanted to capture the context we discussed and what might be necessary. This is a rough shape of a solution and will need more work to get standardized and implemented.

The general idea is that we'd like to explore adding component model intrinsics which support the ability to load an arbitrary wasm module at runtime, open it, and start executing it. This is what dlopen does on native platforms and is useful for a variety of use cases. Perhaps chiefly though is that existing language ecosystems expect this to work, so getting them to work requires an implementation of dlopen.

The other general idea is that we'd like to standardize as-general-as-possible intrinsics and building blocks as necessary. Emscripten for example has a model of dynamic linking today but we don't want to bake that exactly as-is into the component model. Instead it should be possible to build various other forms of dynamic linking, if necessary, on top of component model intrinsics. The north star for now is the Emscripten-style dynamic linking since that's what tooling supports, but it's hoped that implementation support can still be generalized.

Component Model Changes

Supporting a full-fledged dlopen will require changes to the component model today.

Component Model: New Types

A new built-in resource type will be added to the component model, a "moduleref". For example in the component model you'll be able to do:

(component 
  (type $moduleref module)
  (import "x" (func (result (own $moduleref))))
)

A module here is a resource definition of a new type that the host understands. This is similar to declaring and importing a resource except that it's provided by the host and is the same across all components. This resource type can have own and borrow handles like other resources in the component model.

This new type would additionally be added to WIT, too.

Component Model: New WASI APIs

With this new type available in the component model the thinking is that new WASI APIs would be added for acquiring modules. This enables hosts to implement a variety of methods of identifying and loading modules. Furthermore by being WASI APIs it enables virtualizing these implementations as necessary too. Currently the rough idea is:

package wasi:compile;

interface compile {
    enum error { /* ... */ }

    // bikeshed this name, `wasi:compile/compile/compile` is a lot
    compile: func(wasm: list<u8>) -> result<module, error>;
}

interface preopens {
    get: func(name: string) -> option<module>;
}

Here a host can provide the ability to compile arbitrary wasm bytes. These bytes might be loaded through the filesystem, for example, or through other means. Hosts should be able to return "not supported" for compile or this would also be a great use case for optional imports.

Hosts also can provide a set of propened modules (perhaps with a better name). This represents ahead-of-time compiled modules for examples and might be more suitable in contexts where fully dynamic runtime compilation is not allowed.

When implementing dlopen it's expected that wasi-libc would locate the module-to-instantiate by doing something like:

  • First lookup the module name with the preopens/get method. Use that if present.
  • Otherwise interpret the module name and try to find a file on the filesystem.
  • If found, compile it with compile. If that fails, then return an error.

At this point dlopen has a handle to a module to instantiate, so the next bit is instantiating it.

Component Model: New Intrinsincs

Instantiation is sketched here as entirely outside the realm of WIT. Everything that follows is purely a component model intrinsic (similar to resource.drop) and can be synthesized in any component.

First up are intrinsics to perform runtime inspection of a module. Everything here is listed as-if it had mostly-WIT types but each intrinsic here is actually producing a core module.

  • module.imports_len : func(m: borrow<module>) -> u32 - returns the number of imports a module has
  • module.import_{module,name}_len : func(m: borrow<module>, import: u32) -> u32 - returns the byte length of the import name (utf-8 encoded)
  • module.import_{module,name} $memory : func(m: borrow<module>, import: u32, ptr: i32) - fills in ptr in linear memory with the contents of the nth import name.

Note that at this time type-reflection of modules isn't supported. It's expected that can be added later if needed, but it's hopefully not needed yet. (TODO: maybe these should just be component-model WIT types?)

Next there will additionally be an API to read custom sections of modules, for example dylink.0 in the Emscripten-based ABI:

  • module.custom_section_size : func(m: borrow<module>, name: string) -> option<u32> - returns the byte length of the custom section name, or none if it's not present.
  • module.custom_section_read $memory : func(m: borrow<module>, dst: i32, len: i32, src: i32) - reads a custom section into linear memory with a memcpy-style API.

(TODO: like above, maybe this is better modeled with component model types? Also needs to handle the possibility of repeated custom sections too)

Next there needs to be the ability to build up the set of imports that will be used to instantiate a module. This is done with an "imports builder" type which acts like a resource but doesn't actually have any definition in WIT or the component model itself (at least not at this time)

  • imports_builder.new : func() -> IB - create a new blank imports builder
  • imports_builder.drop : func(IB) - destroys a builder (TODO: maybe resource.drop?)
  • imports_builder.bind_{memory,global,table,func} $index : func(borrow<IB>, string, string) - binds the statically provided item to the names provided. This is used, for example, to provide a module's own memory to the import list
  • imports_builder.new_global_i32 : func(borrow<IB>, string, string, i32) - creates a brand new wasm global (mutable? new parameter?) with the provided initial value. (this is assumed it's needed for the Emscripten ABI)
  • imports_builder.bind_funcref : func(borrow<IB>, string, string, funcref) - binds the provided function to the specified import name. This is used to provide a module's own functions to imports.

It's hoped that with all of the above it's possible to implement basically everything in dlopen from the Emscripten dynamic linking ABI. With all of this it culminates in a single intrinsic:

  • imports_builder.instantiate : func(borrow<module>, borrow<IB>) -> result<instance, string>

where this final instantiate intrinsic is used to perform instantiation itself (TODO: return type here needs some work).

There will also need to be an API or two to lookup globals/functions on the returned instance.

Integration with wasi-libc

It's hoped that all of the above will be implementations of dlopen in wasi-libc. It's not expected that applications will necessarily be manipulating the intrinsics themselves and such. All the details of how the Emscripten dynamic linking ABI, for example, would be encoded in wasi-libc in terms of matching names, providing imports, manipulating memories and globals, etc.


This is very much a work-in-progress design. Even just writing this up I feel like we may want to shift more things into WIT or similar or have WIT-defined builtins rather than so many intrinsics. Furthermore there's a lot of details here to prove out and also ensure that there's enough functionality to fully implement Emscripten's dynamic linking ABI.

cc @dicej, @fitzgen, @sunfishcode

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions