-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to use utf-16 positions instead of utf-8 #126
Comments
@youknowone, what do you think? |
I'd like to understand the problem better. Is utf-16 position calculation a performance bottleneck of the project? |
The linter that we built parses the code on every key stroke, so performance is quite important to us. The utf-16 conversion is not the most critical performance bottleneck, but it would still be a nice win for us to get rid of it. We currently iterate over the python code once and create a lookup table that maps utf-8 positions to utf-16 positions (without libraries). It would be nice for us if the AST would provide the utf-16 positions directly. I thought we could maybe parameterize the Parser/vendored/src/text_size/traits.rs Lines 31 to 37 in 5e9d985
That way we would not need to create that lookup table in the linter. |
I just saw this comment #125 (comment). I thought this is the parser that ruff uses? The README is misleading if this is not true. Would you recommend us to also move to the Ruff parser? |
Rust ecosystem usually don't parametrize len() itself for utf16. Rather than that, adding That would be better to be changed. I forgot the version number, but Ruff is how using its own parser. If performance is critical, that has better performance by their benchmark. |
We use this parser for a custom linter that we ship through a VSCode extension (
rustpython_parser::ast::Suite::parse_without_path
). The VSCode extension api expects byte positions to the text encoded in utf-16, but the parser returns the byte positions to text encoded in utf-8. Can we add an option to use utf-16 encoded byte positions?I'd be happy to implement this feature if you're comfortable with adding it to the parser. Do you have suggestions on how the API should look like?
The text was updated successfully, but these errors were encountered: