Printer quoting bug #1720

ef4 · 2025-02-24T20:52:59Z

I hit this printer bug while testing @embroider/template-tag-codemod against http://github.com/rust-lang/crates.io.

The printer replaces ' with ", even when there were already " characters inside the string literal, and they don't get escaped, resulting in a syntax error.

The real line that triggered this was: https://github.com/rust-lang/crates.io/blob/54ad496796eaf3499fc2b6fb96d167b3e94d7d70/app/components/crate-row.hbs#L12

github-actions · 2025-02-24T20:57:34Z

This PR main

Dev

583K └─┬ .
169K   ├── runtime
160K   ├── syntax
100K   ├── compiler
 58K   ├── opcode-compiler
 27K   ├── manager
 19K   ├── validator
 11K   ├── program
8.9K   ├── reference
7.2K   ├── destroyable
6.3K   ├── util
4.3K   ├── node
3.4K   ├── global-context
2.5K   ├── wire-format
1.0K   ├── vm
969B   ├── encoder
844B   ├── vm-babel-plugins
606B   └── owner

582K └─┬ .
169K   ├── runtime
159K   ├── syntax
100K   ├── compiler
 58K   ├── opcode-compiler
 27K   ├── manager
 19K   ├── validator
 11K   ├── program
8.9K   ├── reference
7.2K   ├── destroyable
6.3K   ├── util
4.3K   ├── node
3.4K   ├── global-context
2.5K   ├── wire-format
1.0K   ├── vm
969B   ├── encoder
844B   ├── vm-babel-plugins
606B   └── owner

Prod

230K └─┬ .
 70K   ├── syntax
 63K   ├── runtime
 48K   ├── compiler
 18K   ├── opcode-compiler
7.9K   ├── manager
4.8K   ├── program
4.0K   ├── validator
3.6K   ├── reference
2.4K   ├── util
2.1K   ├── node
1.6K   ├── wire-format
1.5K   ├── destroyable
737B   ├── vm
594B   ├── global-context
516B   ├── encoder
469B   ├── vm-babel-plugins
155B   └── owner

230K └─┬ .
 69K   ├── syntax
 63K   ├── runtime
 48K   ├── compiler
 18K   ├── opcode-compiler
7.9K   ├── manager
4.8K   ├── program
4.0K   ├── validator
3.6K   ├── reference
2.4K   ├── util
2.1K   ├── node
1.6K   ├── wire-format
1.5K   ├── destroyable
737B   ├── vm
594B   ├── global-context
516B   ├── encoder
469B   ├── vm-babel-plugins
155B   └── owner

ef4 · 2025-02-25T16:50:25Z

Added proposed fix, which preserves the original quote style whenever possible, based on the source loc info.

chancancode · 2025-02-25T19:00:16Z

Hm, definitely agree we should fix the bug.

However I also think the printer should be self-consistent. ~~In particular, I think the way we handle this should be the same as StringLiteral, which doesn't seem to have the same bug, by using JSON.stringify().~~

Robert did a bunch of stuff here for codemods, which I wasn't super familiar with. Reading the current implementation though, it seems like the printer generally takes on a "normalizing" role by default (e.g. all comment styles get normalized to the full form) – but there is a hook that codemods can override to accomplish things like this.

I'm leaning towards we should preserve that overall characteristics here (i.e. by fixing this via JSON.stringify or similar and always normalize into double quotes)?

If we don't think that normalizing mode is generally useful, that's fine too, but we should make a deliberate decision to start moving in the other direction and ask for/accept patches to fix the comment formatting, etc.

I think a problem we have/used to have is things get added/changed to solve one specific problem in a way that's inconsistent with the existing design, and overtime it becomes hard to tell what the code is supposed to do and manage people's expectations. Personally, that's the kind of think I care about, but I also don't actively work on this very much so if other maintainers don't share that perspective, feel free to merge.

(Looking at things more closely, it seems like concat possibly have the same bug, which will affect attributes with curlies I think.)

chancancode · 2025-02-25T23:56:55Z

It also occurred to me that there is an additional problem with the current code that the proposed patch didn’t solve: slashes escaped in the original hbs source will be parsed into node.value with the intended semantic meaning, but if we just print node.value back out into the buffer naively, the escape characters will be lost and that can be malformed regardless of quoting style.

~~So that’s another argument in favor of JSON.stringify() for correctness, which should re-escape the output as needed for double quotes.~~

I’m supportive of a best-effort verbatim codemod mode if someone wants it (personally, I also don’t mind the codemod outputting whatever as long as it’s correct, then have prettier fix it), but IMO the current code isn’t intending to do that; if that’s the new goal it would need to be done carefully/holistically with better test coverage around these edge cases.

chancancode · 2025-02-26T00:15:13Z

hm, I guess not, it seems like attribute nodes have special parsing rules that are unlike string literals: https://astexplorer.net/#/gist/3763b1482a332946dc94c727c33d8ea8/e4def25bc6c4bacf8aff3d7c1f10c28a4a952362

In that case, this is less of a stylistic choice and more of a correctness issue. It's probably more correct to check node.value (as opposed to loc source, which can be missing/wrong, like in #1430) for includes('"'):

no – use "
yes –
- assert(!includes("'")) – no way to print this currently; must be a bad AST transform?
- use '

Does that work?

chancancode · 2025-02-26T00:18:56Z

packages/@glimmer/syntax/lib/generation/printer.ts

+      let quote = '"';
+      let originalChar = value.loc.asString()[0];
+      if (originalChar === "'") {
+        quote = "'";
+      }


Suggested change

let quote = '"';

let originalChar = value.loc.asString()[0];

if (originalChar === "'") {

quote = "'";

}

let quote = '"';

if (node.value.includes('"')) {

// or whatever we need to do to report errors properly in this context

assert('unable to print malformed attribute node, an attribute cannot contain both `"` and `'`', !node.value.includes("'"));

quote = "'";

}

ef4 · 2025-02-26T15:41:12Z

Yeah, this general strategy seems fine, I didn't love the idea of relying on loc.

no way to print this currently; must be a bad AST transform?

I think the HTML answer here is entity encoding, and there are already other tests that are exercising that. In the default mode, my example already transforms like this:

-<div class='He said "yes"' ></div>
+<div class="He said &quot;yes&quot;"></div>

which is correct, but presumably is an example of what annoyed somebody enough that they introduced mode: 'codemod' in the parser and entityEncoding: 'raw' in the printer. The bug is only in entityEncoding: 'raw' mode, and should probably be targeted at only that mode.

I’m supportive of a best-effort verbatim codemod mode if someone wants it (personally, I also don’t mind the codemod outputting whatever as long as it’s correct, then have prettier fix it), but IMO the current code isn’t intending to do that;

Unfortunately that horse left the barn many years ago. We're looking at a bug while using an existing mode named "codemod". So it absolutely does intend to do that, it just does it badly. The implementation is very confused and poorly layered (which is why it's also riddled with override points).

Of course I agree it would be great to holistically refactor the printer to be self-consistent. Somebody's gotta do it though. Until then, we have an existing "codemod mode" with a correctness bug.

ef4 · 2025-02-26T16:24:55Z

New implementation is pushed and passing tests.

It no longer uses source loc info.
It only changes behavior when using entityEncoding: 'raw' mode, which is the only case that had this bug.
It opportunistically uses single quotes when that avoids need to replace double quotes with "
It falls back to doing the full encoding that the default mode uses when we encounter both kinds of quotes in one attribute (the test I added demonstrates how an AST transform can cause this to happen)

chancancode · 2025-02-26T19:10:04Z

Yea, seems good to me.

For the codemod mode, reading the comments here, my impression is that the intention is for the caller (whoever wires up the codemod mode upstream) to use this:

glimmer-vm/packages/@glimmer/syntax/lib/generation/printer.ts

Lines 33 to 47 in 1c34f75

    
             /** 
        
              * Used to override the mechanism of printing a given AST.Node. 
        
              * 
        
              * This will generally only be useful to source -> source codemods 
        
              * where you would like to specialize/override the way a given node is 
        
              * printed (e.g. you would like to preserve as much of the original 
        
              * formatting as possible). 
        
              * 
        
              * When the provided override returns undefined, the default built in printing 
        
              * will be done for the AST.Node. 
        
              * 
        
              * @param ast the ast node to be printed 
        
              * @param options the options specified during the print() invocation 
        
              */ 
        
             override?(ast: ASTv1.Node, options: PrinterOptions): void | string;

I don’t think that’s necessarily the end consumer, but maybe some code in here/some shared thing Robert made years ago?

In any case trying to avoid HTML escaping where possible seems like a reasonable goal even with the normalizing philosophy, especially when it’s reliable. So I’m good with this regardless.

ef4 · 2025-02-26T20:12:54Z

I'm aware of the override thing, it's really only useful for one very specific use case, which is when you want to return the cached old serialization of known nodes that haven't changed. I think the only user is https://github.com/ember-template-lint/ember-template-recast/blob/3da4525e530436ea3faec6ff9b402a0fadd2e900/src/index.ts#L15-L19

I'm not doing anything like that, I don't actually care about trying to get format preservation, I'm only using "codemod mode" to get the better entity handling (which I agree should probably just be part of the default behavior anyway).

Add failing test for printer quoting bug

bc7a615

ef4 mentioned this pull request Feb 24, 2025

codemod printer bug embroider-build/embroider#2314

Draft

ef4 requested review from wycats and chancancode February 25, 2025 16:50

ef4 force-pushed the printer-quote-bug branch from 72f0246 to 1c34f75 Compare February 25, 2025 16:59

chancancode reviewed Feb 26, 2025

View reviewed changes

ef4 force-pushed the printer-quote-bug branch 3 times, most recently from 60406be to a6292f4 Compare February 26, 2025 16:18

Don't emit broken unescaped quotes in entityEncodingRaw mode

656817c

ef4 force-pushed the printer-quote-bug branch from a6292f4 to 656817c Compare February 26, 2025 16:33

chancancode approved these changes Feb 26, 2025

View reviewed changes

ef4 merged commit 3b3ca18 into main Feb 26, 2025
9 checks passed

ef4 deleted the printer-quote-bug branch February 26, 2025 20:12

github-actions bot mentioned this pull request Feb 24, 2025

Prepare Release #1715

Open

NullVoxPopuli added the bug label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Printer quoting bug #1720

Printer quoting bug #1720

ef4 commented Feb 24, 2025

github-actions bot commented Feb 24, 2025 •

edited

Loading

ef4 commented Feb 25, 2025

chancancode commented Feb 25, 2025 •

edited

Loading

chancancode commented Feb 25, 2025 •

edited

Loading

chancancode commented Feb 26, 2025 •

edited

Loading

chancancode Feb 26, 2025

ef4 commented Feb 26, 2025

ef4 commented Feb 26, 2025

chancancode commented Feb 26, 2025

ef4 commented Feb 26, 2025

Printer quoting bug #1720

Printer quoting bug #1720

Conversation

ef4 commented Feb 24, 2025

github-actions bot commented Feb 24, 2025 • edited Loading

ef4 commented Feb 25, 2025

chancancode commented Feb 25, 2025 • edited Loading

chancancode commented Feb 25, 2025 • edited Loading

chancancode commented Feb 26, 2025 • edited Loading

chancancode Feb 26, 2025

Choose a reason for hiding this comment

ef4 commented Feb 26, 2025

ef4 commented Feb 26, 2025

chancancode commented Feb 26, 2025

ef4 commented Feb 26, 2025

github-actions bot commented Feb 24, 2025 •

edited

Loading

chancancode commented Feb 25, 2025 •

edited

Loading

chancancode commented Feb 25, 2025 •

edited

Loading

chancancode commented Feb 26, 2025 •

edited

Loading