Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF in multi-line string error for valid format string #4588

Open
skepppy opened this issue Feb 24, 2025 · 2 comments
Open

EOF in multi-line string error for valid format string #4588

skepppy opened this issue Feb 24, 2025 · 2 comments
Labels
C: parser How we parse code. Or fail to parse it. F: strings Related to our handling of strings T: bug Something isn't working

Comments

@skepppy
Copy link

skepppy commented Feb 24, 2025

Describe the bug

Black errs on format string while Python runs the code without error.

To Reproduce

x = "test"

if f"{x}:\\":y = f'{x}'; print(y)

And run it with these arguments:

$ black test.py --target-version py310

The resulting error is:

error: cannot format test.py: Cannot parse for target version Python 3.10: 3:22: EOF in multi-line string

Expected behavior

The same code but formatted. Python does accept this input and will print y.

Environment

$ black --version
black, 25.1.0 (compiled: yes)
Python (CPython) 3.10.12
$ python3 --version
Python 3.10.12
$ uname -a
Linux ubuntu 6.8.0-49-generic #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
@skepppy skepppy added the T: bug Something isn't working label Feb 24, 2025
@JelleZijlstra JelleZijlstra added F: strings Related to our handling of strings C: parser How we parse code. Or fail to parse it. labels Feb 24, 2025
@MeGaGiGaGon
Copy link
Collaborator

MeGaGiGaGon commented Feb 24, 2025

Minimized link

if f"\\":'{1}'

The quotes have to be different and the first string has to be a f-string.
This tokenizes as

1,0-1,2:        NAME    'if'
1,3-1,5:        FSTRING_START   'f"'
1,5-1,10:       FSTRING_MIDDLE  '\\\\":\''
1,10-1,11:      LBRACE  '{'
1,11-1,12:      NUMBER  '1'
1,12-1,13:      RBRACE  '}'

The issue starts at this line \src\blib2to3\pgen2\tokenize.py(869)

pseudomatch = pseudoprog.match(line, pos)

Where the match is <re.Match object; span=(2, 11), match=' f"\\\\":\'{'>
Going through the code, eventually the f-string start of f" is correctly popped, but then at line 1017 the wrong FSTRING_MIDDLE is created by

fstring_middle = line[start + offset : end_offset]

So the issue looks to be that the pseudoprog regex is too greedy, and the FSTRING_MIDDLE results trust it to not be.
Loading up pseudoprog in regex101 link, group 17 is the one that pulls too much of the input.

((?:[uUrRbB]|[rR][bB]|[bBuU][rR])?'(?:[^\n'\\]|\\.)*('|\\\r?\n)|(?:[uUrRbB]|[rR][bB]|[bBuU][rR])?"(?:[^\n"\\]|\\.)*("|\\\r?\n)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)')(?:\\N{|{{|\\'|[^\n'{])*(?<!\\N)({)(?!{)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)")(?:\\N{|{{|\\"|[^\n"{])*(?<!\\N)({)(?!{)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)')(?:[^\n'\\]|\\.)*('|\\\r?\n)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)")(?:[^\n"\\]|\\.)*("|\\\r?\n))

Looking through the debugger, the faulty part is this: (?:\\N{|{{|\\"|[^\n"{])* which matches \\":', because \\" first eats up the ending quote, then on the next loop [^\n"{] is able to keep matching. This comes once again from the same part of the regex as #4520, \src\blib2to3\pgen2\tokenize.py(129)

# beginning of a single quoted f-string. must not end with `{{` or `\N{`
SingleLbrace = r"(?:\\N{|{{|\\'|[^\n'{])*(?<!\\N)({)(?!{)"
DoubleLbrace = r'(?:\\N{|{{|\\"|[^\n"{])*(?<!\\N)({)(?!{)'

but the area of the issue is slightly different, cc @tusharsadhwani both because you'd know better how to fix it and this would be a good case to add to #4536
With all of this found, you can further minimize it to just f"\\" '{1}'
f"\\"'{' further minimization just for fun, the { is needed to stop the regex from backtracking, which is what normally stops the issue from happening since the functional case without a { takes a different path later in the regex through (?:[^\n"\\]|\\.)*

@tusharsadhwani
Copy link
Collaborator

tusharsadhwani commented Feb 24, 2025

this already parses correctly with the new tokenizer :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: parser How we parse code. Or fail to parse it. F: strings Related to our handling of strings T: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants