Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more wpt tests for url_search_params #793

Closed
wants to merge 1 commit into from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Dec 2, 2024

These tests are failing on Cloudflare workers, which uses URLSearchParams implementation in Ada.

Ref: https://github.com/cloudflare/workerd/blob/main/src/workerd/api/wpt/url-test.js#L118

Let's see if the issue is within ourselves or not.

cc @lemire @jasnell

@anonrig anonrig requested review from lemire and jasnell December 2, 2024 00:12
@anonrig anonrig force-pushed the yagiz/add-failing-tests branch from 9189c77 to 90d6770 Compare December 2, 2024 00:17
@lemire
Copy link
Member

lemire commented Dec 3, 2024

@anonrig

So ada does not do Unicode validation. For URL parsing, the spec says that we begin with a valid unicode string (i.e., scalar value string).

This is also true of the search params routine as far as I can tell:

string parser takes a scalar value string input, UTF-8 encodes it, and then returns the result of application/x-www-form-urlencoded parsing it.

Admittedly, url_search_params does not document that it expects a valid UTF-8 sequence. It probably should even though it is part of the specification. But elsewhere, we are quite clear on the expectation:

ada/README.md

Line 130 in f7fbea5

- Parse and validate a URL from an ASCII or a valid UTF-8 string.

* it should be a valid UTF-8 string. The parameter base_url is an optional

So what is happening in the Cloudflare workers tests? I did not look, but you presumably have.

Since they are called from JavaScript... what might be happening? Well. As far a I can tell, JavaScript does not replacement on its own... So the following is just fine in JavaScript:

> let s = "\uD835x=1&xx=2&\uD83Dx=3"
undefined
> s
'\ud835x=1&xx=2&\ud83dx=3'

See how it displays \ud835? That's not valid Unicode, but JavaScript is fine.

Now you can do...

> let s = "\uD835x=1&xx=2&\uD83Dx=3"
undefined
> s
'\ud835x=1&xx=2&\ud83dx=3'
> s.toWellFormed()
'�x=1&xx=2&�x=3'
> s.toWellFormed()[0] == "\ufffd"
true

See the question marks? They are the replacement character ("\ufffd").

I think that what is happening is that the Cloudflare system is starting with an unmatched surrogate pair (invalid Unicode). It does not do replacement.

After that, I am not sure.

Now, ada does accept UTF-16 inputs, only UTF-8... so the code must, somewhere, convert from UTF-16 (in this case invalid UTF-16) and then convert it to UTF-8 (hopefully valid UTF-8). But something goes wrong, it should do the UTF-16 to UTF-8 conversion with replacement. Maybe it does not.

@lemire
Copy link
Member

lemire commented Dec 3, 2024

As for the test you are trying to add, I submit to you that it is outside our scope. We have no support for invalid Unicode. And, in any case, we take in UTF-8 so the concept of unmatched surrogate pair does not apply.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation at this time is to discard this PR. Please see my comments.

@anonrig anonrig closed this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants