feat: Allow users hook early on request to Support Unicode and sanitization uses #765
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❓ Type of change
The Problem:
Sometimes, when dealing with non-ASCII characters, for example, for people who speak Spanish, French, Portuguese, Farsi, German, Polish, Russian, Chinese or Hindi, if they want to get a site that uses directly their language in urls, they'll will face this problem: URLs containing non-ASCII characters (e.g., accents, diacritics) can be encoded in uppercase or lowercase.
I mean, the characters can be uri-encoded as lowercase or uppercase, because technically they must be decoded again as case insensitive (See RFC 3986, Section 6.2.2.1).
So a word like: encyclopédie, can be:
And what's the problem with that? Matches! Because although 'encyclopédie' === 'encyclopédie', is not the same for 'encyclop%c3%a9die' !== 'encyclop%C3%A9die'.
As so, if your customers want proper localization in urls, you will have a lot of problems, because on the one hand you'll get Chrome requests that properly implement the uppercased version. But, on the other hand you'll see that NodeJS threat those urls as lowercase.
This mismatch is headache, because the only way for fixing this is creating custom rules on front servers to handle this issue. Or... lose incoming request that humans intent to match the 'encyclopédie' word, but because of the different implementation for some Internet actors, they wont get. When the client do not explicitly encode their request (which is each time more in more common for apis for example) they just get a 404 or a 500 (depends on how'll handle mismatched requests).
Why do that happens?
Well first, Internet was mainly made for English speaker consumers. And some standards will take time to adapt. A lot of frameworks already deal with this situation properly, for example if you work with Golang/Caddy you'll see that they uses Unicode.
But NodeJS dont.
How to solve it?
Allowing users to customize this behaviour as soon as possible, using a really early hook for sanitizing the incoming nodejs request.
This will allow each user, to hook there, and fills their needs.
For example:
This wont hurts nobody, but will allow users to safety choose how to face this problem.
The Solution
As you can see in the pull request, I just add a new hook called 'onSanitizeRequest', and each user will be able to use if they want, allowing users to customize the url behaviour before any url logic is set for the app, but without moving (for safety reasons) the current onRequest hook.
So, the user is the one who choose how to sanitize the incoming requests. For example passing a decoder encoder:
This way for example, all any other posterior requests will works just as if the incoming one request was properly encoded. No problem for anyone!
This will break something?
No! anything, because it won't change any currently set configuration (that's why I'm not proposing to just place up the already existing 'onRequest' hook), because that hook could contain unexpected hooked things and (we don't know), brake something for a project. This will allow a new (and so no changing) option for hooking in there.
📝 Checklist