EOF in multi-line string error for valid format string #4588

skepppy · 2025-02-24T11:46:02Z

Describe the bug

Black errs on format string while Python runs the code without error.

To Reproduce

x = "test"

if f"{x}:\\":y = f'{x}'; print(y)

And run it with these arguments:

$ black test.py --target-version py310

The resulting error is:

error: cannot format test.py: Cannot parse for target version Python 3.10: 3:22: EOF in multi-line string

Expected behavior

The same code but formatted. Python does accept this input and will print y.

Environment

$ black --version
black, 25.1.0 (compiled: yes)
Python (CPython) 3.10.12
$ python3 --version
Python 3.10.12
$ uname -a
Linux ubuntu 6.8.0-49-generic #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

MeGaGiGaGon · 2025-02-24T19:45:13Z

Minimized link

if f"\\":'{1}'

The quotes have to be different and the first string has to be a f-string.
This tokenizes as

1,0-1,2:        NAME    'if'
1,3-1,5:        FSTRING_START   'f"'
1,5-1,10:       FSTRING_MIDDLE  '\\\\":\''
1,10-1,11:      LBRACE  '{'
1,11-1,12:      NUMBER  '1'
1,12-1,13:      RBRACE  '}'

The issue starts at this line \src\blib2to3\pgen2\tokenize.py(869)

pseudomatch = pseudoprog.match(line, pos)

Where the match is <re.Match object; span=(2, 11), match=' f"\\\\":\'{'>
Going through the code, eventually the f-string start of f" is correctly popped, but then at line 1017 the wrong FSTRING_MIDDLE is created by

fstring_middle = line[start + offset : end_offset]

So the issue looks to be that the pseudoprog regex is too greedy, and the FSTRING_MIDDLE results trust it to not be.
Loading up pseudoprog in regex101 link, group 17 is the one that pulls too much of the input.

((?:[uUrRbB]|[rR][bB]|[bBuU][rR])?'(?:[^\n'\\]|\\.)*('|\\\r?\n)|(?:[uUrRbB]|[rR][bB]|[bBuU][rR])?"(?:[^\n"\\]|\\.)*("|\\\r?\n)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)')(?:\\N{|{{|\\'|[^\n'{])*(?<!\\N)({)(?!{)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)")(?:\\N{|{{|\\"|[^\n"{])*(?<!\\N)({)(?!{)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)')(?:[^\n'\\]|\\.)*('|\\\r?\n)|((?:rF|FR|Fr|fr|RF|F|rf|f|Rf|fR)")(?:[^\n"\\]|\\.)*("|\\\r?\n))

Looking through the debugger, the faulty part is this: (?:\\N{|{{|\\"|[^\n"{])* which matches \\":', because \\" first eats up the ending quote, then on the next loop [^\n"{] is able to keep matching. This comes once again from the same part of the regex as #4520, \src\blib2to3\pgen2\tokenize.py(129)

# beginning of a single quoted f-string. must not end with `{{` or `\N{`
SingleLbrace = r"(?:\\N{|{{|\\'|[^\n'{])*(?<!\\N)({)(?!{)"
DoubleLbrace = r'(?:\\N{|{{|\\"|[^\n"{])*(?<!\\N)({)(?!{)'

but the area of the issue is slightly different, cc @tusharsadhwani both because you'd know better how to fix it and this would be a good case to add to #4536
With all of this found, you can further minimize it to just f"\\" '{1}'
f"\\"'{' further minimization just for fun, the { is needed to stop the regex from backtracking, which is what normally stops the issue from happening since the functional case without a { takes a different path later in the regex through (?:[^\n"\\]|\\.)*

tusharsadhwani · 2025-02-24T19:59:05Z

this already parses correctly with the new tokenizer :D

skepppy added the T: bug Something isn't working label Feb 24, 2025

JelleZijlstra added F: strings Related to our handling of strings C: parser How we parse code. Or fail to parse it. labels Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOF in multi-line string error for valid format string #4588

EOF in multi-line string error for valid format string #4588

skepppy commented Feb 24, 2025

MeGaGiGaGon commented Feb 24, 2025 •

edited

Loading

tusharsadhwani commented Feb 24, 2025 •

edited

Loading

EOF in multi-line string error for valid format string #4588

EOF in multi-line string error for valid format string #4588

Comments

skepppy commented Feb 24, 2025

MeGaGiGaGon commented Feb 24, 2025 • edited Loading

tusharsadhwani commented Feb 24, 2025 • edited Loading

MeGaGiGaGon commented Feb 24, 2025 •

edited

Loading

tusharsadhwani commented Feb 24, 2025 •

edited

Loading