Interface version canonicalization #536

lann · 2025-06-25T19:34:14Z

I stuck fullversion in the import/export productions rather than interfacename because I wanted it to be clear that it wouldn't be lowered into the core name.
The version canonicalization rules are adapted from Add BuildTargets.md #378. I'm still leaning toward omitting prerelease versions but I've only thought "medium hard" about it.
Still needs binary encoding; see comment below.
Not sure how best to capture the discussion about making canonicalization mandatory pre-1.0; the "Binary Warts" section doesn't seem quite right.

lann · 2025-06-25T21:28:14Z

For the binary encoding the most straightforward option from a quick review would seem to be adding variants of importname' / exportname' along the lines of:

importname' ::= 0x00 len:<u32> in:<importname>                       => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname> fullverlen:<u16> fullver:<valid semver>

I suppose if we wanted to optimize the binary a bit this extra field could contain just the part of the original version that got lopped off by canonicalization.

On this field width:

fullverlen:<u16>

https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

No, but use good judgment. A 255 character version string is probably overkill, for example. Also, specific systems may impose their own limits on the size of the string.

🤷

lukewagner · 2025-06-26T21:02:44Z

@lann Thanks for starting this! For the binary encoding question: yes, taking over the 0x00 byte and using it as a discriminant is a nice coincidence we can take advantage of (and could you update the corresponding bullet in the "Warts" section at the end)?

I suppose if we wanted to optimize the binary a bit this extra field could contain just the part of the original version that got lopped off by canonicalization.

Is there a simplicity argument to be made that requiring the concatenation of the version and the fullversion to match <valid semver> is simpler than allowing the fullversion to be <valid semver> and then adding the additional validation requirement (which I assume we want) that the fullversion has to "match" the version? If so, that could be a second argument in favor in addition to size.

lukewagner

Looking good! A few drive-by comments:

design/mvp/Explainer.md

lukewagner

(oops, meant to "comment" not approve before it's even ready to review 🙃 )

alexcrichton · 2025-06-27T14:41:51Z

For the binary encoding, here's another possible encoding:

importname' ::= 0x00 len:<u32> in:<importname>                       => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname>                       => "${in.name}@N"  (if len = |in|,  in.version = N.*)
              | 0x02 len:<u32> in:<importname>                       => "${in.name}@0.N"  (if len = |in|,  in.version = 0.N.*)
              | 0x03 len:<u32> in:<importname>                       => "${in.name}@0.0.N"  (if len = |in|,  in.version = 0.0.N.*)

maybe with affordances for rc/etc unsure. The basic idea though is that the actual import name would always be foo:bar/[email protected] in the binary format but the semantic meaning (e.g. the text format) would be a subslice of such a string. This codifies that in the binary format it's always a valid semver and the discriminant byte says basically how to shorten it. The goal here would be to make the binary format still pretty clear what it can be without changing the meaning of the meaning at a parsed layer.

fullverlen:

https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

No, but use good judgment. A 255 character version string is probably overkill, for example. Also, specific systems may impose their own limits on the size of the string.

For this I'd recommend using <u32> regardless. We already limit many strings far below the theoretical 4G limit with a 32-bit length and keeping <u32> makes it more consistent with the rest of the decoding process. Otherwise when implementing a decoder you'd have to implement a specific function for decoding a 16-bit LEB which is otherwise not required when parsing WebAssembly today. Basically while I agree that >255 characters for a version is silly, I'd say that for consistency with the rest of the binary format this'd want to be <u32> if we go with this variant.

lukewagner · 2025-06-30T17:28:10Z

@alexcrichton Good idea; that cleanly answers some of the questions above. My only light concern is that tools might just treat the <importname> as the name and miss the nuance of chopping off parts of the versions. I suppose tests and common low-level tools could catch/factor-out most of this though. But if we go this direction: I suppose technically we don't even need the {0, 1, 2, 3} opcode; it could just be derived from the full <valid semver> string, making version canonicalization a binary encoding detail. Thoughts?

alexcrichton · 2025-06-30T18:03:27Z

I agree yeah there's risk since the name in the binary format is "so simple", but yeah that's also where I'd hope that tests could weed things out. It'd be pretty simple in parser libraries I'd imagine to avoid exposing the full name as the import name if the discriminant was present.

My thinking though was that the name always has a full and valid semver, as defined by semver itself. That way the discriminant says what the "real" import name is (e.g. chopping off other stuff) for linking/semantic purposes. Although I may be misunderstanding what you're thinking about how to drop the discriminant?

lann · 2025-06-30T18:25:16Z

I think @lukewagner is suggesting that the differences between 1/2/3 can be derived from the string itself. The algo would be something like:

starting at @:
if the string between @ and the first . isn't 0, trim before the first .
if the string between @ and the second . isn't 0.0, trim before the second .
otherwise, trim immediately after any digits after the second . (which should only be a - or +)

alexcrichton · 2025-06-30T18:30:52Z

Ah I see! So something like (as a transition to the future):

importname' ::= 0x00 len:<u32> in:<importname>  => in  (if len = |in|)
              | 0x01 len:<u32> in:<importname>  => "${in.name}@${in.canonver}"  (if len = |in|)

where in the future we'd drop 0x00 entirely (and possibly rename 0x01 to 0x00). The <importname> is always required to have a full and valid semver too?

lann · 2025-06-30T18:41:36Z

I think of the "semver-aware" options I prefer @lukewagner's 1 (extra) discriminant option; if you are parsing semver anyway then the logic is only marginally more complex than the 3 discriminant option.

I'm more ambivalent on whether the parser should be semver-aware. I like the conceptual simplicity of "the name is the name" but this is a binary encoding and if we're going to require validation of semver then we're probably already committing to most of that code complexity anyway.

lann · 2025-07-01T18:45:23Z

We discussed this in a meeting today and decided to simplify a bit:

In the text format fullversion will change to versionsuffix and hold just the part of the full version that is removed by canonicalization
The binary format will use two strings: the canonicalized import name and the versionsuffix

lann · 2025-07-01T22:53:07Z

After spending way too much time staring at SemVer and SemVer accessories I have a new draft of the explainer changes. I ran out of time to edit so hopefully it's still coherent...

I should be able to get to the binary format changes tomorrow.

lann force-pushed the truncated-versions branch 3 times, most recently from 7b6bd7d to 2f8eda8 Compare June 25, 2025 20:46

lann changed the title ~~WIP: Truncated interface versions~~ Interface version canonicalization Jun 25, 2025

lann mentioned this pull request Jun 25, 2025

Interface version / compatibilty changes #534

Open

lukewagner approved these changes Jun 26, 2025

View reviewed changes

lukewagner reviewed Jun 26, 2025

View reviewed changes

lann force-pushed the truncated-versions branch from 2f8eda8 to 6d56eaf Compare June 30, 2025 19:27

Add canonical interface name

d3efc82

lann force-pushed the truncated-versions branch from 6d56eaf to d3efc82 Compare July 1, 2025 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interface version canonicalization #536

Interface version canonicalization #536

lann commented Jun 25, 2025 •

edited

Loading

Uh oh!

lann commented Jun 25, 2025 •

edited

Loading

Uh oh!

lukewagner commented Jun 26, 2025

Uh oh!

lukewagner left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukewagner left a comment

Uh oh!

alexcrichton commented Jun 27, 2025

Uh oh!

lukewagner commented Jun 30, 2025

Uh oh!

alexcrichton commented Jun 30, 2025

Uh oh!

lann commented Jun 30, 2025

Uh oh!

alexcrichton commented Jun 30, 2025

Uh oh!

lann commented Jun 30, 2025 •

edited

Loading

Uh oh!

lann commented Jul 1, 2025

Uh oh!

lann commented Jul 1, 2025

Uh oh!

Uh oh!

Interface version canonicalization #536

Are you sure you want to change the base?

Interface version canonicalization #536

Conversation

lann commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lann commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukewagner commented Jun 26, 2025

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Jun 27, 2025

Uh oh!

lukewagner commented Jun 30, 2025

Uh oh!

alexcrichton commented Jun 30, 2025

Uh oh!

lann commented Jun 30, 2025

Uh oh!

alexcrichton commented Jun 30, 2025

Uh oh!

lann commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lann commented Jul 1, 2025

Uh oh!

lann commented Jul 1, 2025

Uh oh!

Uh oh!

lann commented Jun 25, 2025 •

edited

Loading

lann commented Jun 25, 2025 •

edited

Loading

lann commented Jun 30, 2025 •

edited

Loading