Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change escaping to hex escape sequences #65

Closed
michaelficarra opened this issue Nov 21, 2023 · 13 comments · Fixed by #67
Closed

change escaping to hex escape sequences #65

michaelficarra opened this issue Nov 21, 2023 · 13 comments · Fixed by #67

Comments

@michaelficarra
Copy link
Member

There's no need to add complexity of single-character identity escapes for every ASCII punctuator. I would prefer escaping using hex escape sequences instead, as discussed in #58. The only argument given against this is that you'd have to copy-paste any RegExp constructed using this function into a RegExp explainer to understand it, but let's be honest, you were going to have to do that anyway. @sophiebits also points out that by not modifying the grammar, we allow this feature to be polyfilled in older browsers.

@bakkot
Copy link
Collaborator

bakkot commented Nov 21, 2023

What's the argument for doing this, other than the polyfilling thing?

@michaelficarra
Copy link
Member Author

Less RegExp grammar complexity. While I still assert that nobody should be reading the output of RegExp.escape, these grammar additions apply to all RegExps, which will mean I will have to read (or at least be on the lookout for) escaped ASCII punctuators in any RegExp context. I don't want them if they serve no purpose other than to make it harder for me to mentally parse a RegExp.

@bakkot
Copy link
Collaborator

bakkot commented Nov 21, 2023

I'd prefer to encounter \& rather than \x26. At least I have some hope of figuring out what the first one means (i.e., &, the same as how \- means -, etc).

@ljharb
Copy link
Member

ljharb commented Nov 21, 2023

I agree; I would expect developers are quite comfortable with a backslash being a noop for the character, whereas hex escapes would be wildly unfamiliar.

@ljharb ljharb mentioned this issue Nov 21, 2023
32 tasks
@oliverfoster
Copy link
Contributor

oliverfoster commented Nov 21, 2023

As a lay person, if I may, I've got some questions.

Punctuator escaping

a) As hex

  • Polyfillable
  • Less complex

b) As human readable characters

  • More easily human readable
  • Shorter, prettier

Potential additional complexity

It sounds to me like a one or two line change, with a lookup table or equivalent for current punctuators, is that a fair assessment? Or is considerably more complex to produce one over the other?

Other questions

  1. What do other languages do?
  2. What do existing JS implementations do?
  3. What would be the impact of transitioning between styles at a later date?
  4. Does one route prevent or facilitate the other?
  5. Is either route essential?
  6. Could the polyfill produce hex and the standard produce human readable and would the hex be directly equivalent?

Preference

I'm in favour of whichever is simpler. I'd be happy if anything that impedes the progress of .escape is parked for a later date.
I don't think hex escaping is wildly unfamiliar (encodeURI, html special characters) and I agree that \& feels perfectly readable, if not normal (regex escape sequences).

@ljharb
Copy link
Member

ljharb commented Nov 21, 2023

@oliverfoster this can’t be parked for later; it has to be decided before the feature ships and likely can never be changed in the future.

Spec complexity will likely be about the same with either approach; a line or two of grammar vs a line or two to do the hex escape.

@DJ-Laser
Copy link

DJ-Laser commented Dec 6, 2023

I feel like pollyfill for older browsers is more important, and there can always be a function to translate hex codes into backslash escaped characters

@ljharb
Copy link
Member

ljharb commented Dec 6, 2023

We don't generally make changes to proposals solely due to polyfillability.

@ljharb
Copy link
Member

ljharb commented Feb 7, 2024

Rough consensus was to make this change; I'll do that, and then come back in a future meeting to seek stage 2.7.

ljharb added a commit that referenced this issue Mar 22, 2024
@bakkot
Copy link
Collaborator

bakkot commented Mar 22, 2024

Couple comments:

  • uppercase or lowercase?
  • some whitespace is not ascii and so needs \u rather than \x

@ljharb
Copy link
Member

ljharb commented Mar 22, 2024

Filed #67. Currently goes with lowercase.

@michaelficarra
Copy link
Member Author

The Encode AO (currently used by encodeURI and encodeURIComponent) uses uppercase.

Let hex be the String representation of octet, formatted as an uppercase hexadecimal number.

@ljharb
Copy link
Member

ljharb commented Mar 22, 2024

True, but the base64 proposal uses lowercase, as does Number.prototype.toString.

@ljharb ljharb closed this as completed in f01c310 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants