Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printer quoting bug #1720

Merged
merged 2 commits into from
Feb 26, 2025
Merged

Printer quoting bug #1720

merged 2 commits into from
Feb 26, 2025

Conversation

ef4
Copy link
Contributor

@ef4 ef4 commented Feb 24, 2025

I hit this printer bug while testing @embroider/template-tag-codemod against http://github.com/rust-lang/crates.io.

The printer replaces ' with ", even when there were already " characters inside the string literal, and they don't get escaped, resulting in a syntax error.

The real line that triggered this was: https://github.com/rust-lang/crates.io/blob/54ad496796eaf3499fc2b6fb96d167b3e94d7d70/app/components/crate-row.hbs#L12

Copy link
Contributor

github-actions bot commented Feb 24, 2025

This PRmain
Dev
583K └─┬ .
169K   ├── runtime
160K   ├── syntax
100K   ├── compiler
 58K   ├── opcode-compiler
 27K   ├── manager
 19K   ├── validator
 11K   ├── program
8.9K   ├── reference
7.2K   ├── destroyable
6.3K   ├── util
4.3K   ├── node
3.4K   ├── global-context
2.5K   ├── wire-format
1.0K   ├── vm
969B   ├── encoder
844B   ├── vm-babel-plugins
606B   └── owner
582K └─┬ .
169K   ├── runtime
159K   ├── syntax
100K   ├── compiler
 58K   ├── opcode-compiler
 27K   ├── manager
 19K   ├── validator
 11K   ├── program
8.9K   ├── reference
7.2K   ├── destroyable
6.3K   ├── util
4.3K   ├── node
3.4K   ├── global-context
2.5K   ├── wire-format
1.0K   ├── vm
969B   ├── encoder
844B   ├── vm-babel-plugins
606B   └── owner
Prod
230K └─┬ .
 70K   ├── syntax
 63K   ├── runtime
 48K   ├── compiler
 18K   ├── opcode-compiler
7.9K   ├── manager
4.8K   ├── program
4.0K   ├── validator
3.6K   ├── reference
2.4K   ├── util
2.1K   ├── node
1.6K   ├── wire-format
1.5K   ├── destroyable
737B   ├── vm
594B   ├── global-context
516B   ├── encoder
469B   ├── vm-babel-plugins
155B   └── owner
230K └─┬ .
 69K   ├── syntax
 63K   ├── runtime
 48K   ├── compiler
 18K   ├── opcode-compiler
7.9K   ├── manager
4.8K   ├── program
4.0K   ├── validator
3.6K   ├── reference
2.4K   ├── util
2.1K   ├── node
1.6K   ├── wire-format
1.5K   ├── destroyable
737B   ├── vm
594B   ├── global-context
516B   ├── encoder
469B   ├── vm-babel-plugins
155B   └── owner

@ef4
Copy link
Contributor Author

ef4 commented Feb 25, 2025

Added proposed fix, which preserves the original quote style whenever possible, based on the source loc info.

@ef4 ef4 requested review from wycats and chancancode February 25, 2025 16:50
@ef4 ef4 force-pushed the printer-quote-bug branch from 72f0246 to 1c34f75 Compare February 25, 2025 16:59
@chancancode
Copy link
Contributor

chancancode commented Feb 25, 2025

Hm, definitely agree we should fix the bug.

However I also think the printer should be self-consistent. In particular, I think the way we handle this should be the same as StringLiteral, which doesn't seem to have the same bug, by using JSON.stringify().

Robert did a bunch of stuff here for codemods, which I wasn't super familiar with. Reading the current implementation though, it seems like the printer generally takes on a "normalizing" role by default (e.g. all comment styles get normalized to the full form) – but there is a hook that codemods can override to accomplish things like this.

I'm leaning towards we should preserve that overall characteristics here (i.e. by fixing this via JSON.stringify or similar and always normalize into double quotes)?

If we don't think that normalizing mode is generally useful, that's fine too, but we should make a deliberate decision to start moving in the other direction and ask for/accept patches to fix the comment formatting, etc.

I think a problem we have/used to have is things get added/changed to solve one specific problem in a way that's inconsistent with the existing design, and overtime it becomes hard to tell what the code is supposed to do and manage people's expectations. Personally, that's the kind of think I care about, but I also don't actively work on this very much so if other maintainers don't share that perspective, feel free to merge.

(Looking at things more closely, it seems like concat possibly have the same bug, which will affect attributes with curlies I think.)

@chancancode
Copy link
Contributor

chancancode commented Feb 25, 2025

It also occurred to me that there is an additional problem with the current code that the proposed patch didn’t solve: slashes escaped in the original hbs source will be parsed into node.value with the intended semantic meaning, but if we just print node.value back out into the buffer naively, the escape characters will be lost and that can be malformed regardless of quoting style.

So that’s another argument in favor of JSON.stringify() for correctness, which should re-escape the output as needed for double quotes.

I’m supportive of a best-effort verbatim codemod mode if someone wants it (personally, I also don’t mind the codemod outputting whatever as long as it’s correct, then have prettier fix it), but IMO the current code isn’t intending to do that; if that’s the new goal it would need to be done carefully/holistically with better test coverage around these edge cases.

@chancancode
Copy link
Contributor

chancancode commented Feb 26, 2025

hm, I guess not, it seems like attribute nodes have special parsing rules that are unlike string literals: https://astexplorer.net/#/gist/3763b1482a332946dc94c727c33d8ea8/e4def25bc6c4bacf8aff3d7c1f10c28a4a952362

In that case, this is less of a stylistic choice and more of a correctness issue. It's probably more correct to check node.value (as opposed to loc source, which can be missing/wrong, like in #1430) for includes('"'):

  • no – use "
  • yes –
    • assert(!includes("'")) – no way to print this currently; must be a bad AST transform?
    • use '

Does that work?

Comment on lines 296 to 301
let quote = '"';
let originalChar = value.loc.asString()[0];
if (originalChar === "'") {
quote = "'";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let quote = '"';
let originalChar = value.loc.asString()[0];
if (originalChar === "'") {
quote = "'";
}
let quote = '"';
if (node.value.includes('"')) {
// or whatever we need to do to report errors properly in this context
assert('unable to print malformed attribute node, an attribute cannot contain both `"` and `'`', !node.value.includes("'"));
quote = "'";
}

@ef4
Copy link
Contributor Author

ef4 commented Feb 26, 2025

Yeah, this general strategy seems fine, I didn't love the idea of relying on loc.

no way to print this currently; must be a bad AST transform?

I think the HTML answer here is entity encoding, and there are already other tests that are exercising that. In the default mode, my example already transforms like this:

-<div class='He said "yes"' ></div>
+<div class="He said &quot;yes&quot;"></div>

which is correct, but presumably is an example of what annoyed somebody enough that they introduced mode: 'codemod' in the parser and entityEncoding: 'raw' in the printer. The bug is only in entityEncoding: 'raw' mode, and should probably be targeted at only that mode.

I’m supportive of a best-effort verbatim codemod mode if someone wants it (personally, I also don’t mind the codemod outputting whatever as long as it’s correct, then have prettier fix it), but IMO the current code isn’t intending to do that;

Unfortunately that horse left the barn many years ago. We're looking at a bug while using an existing mode named "codemod". So it absolutely does intend to do that, it just does it badly. The implementation is very confused and poorly layered (which is why it's also riddled with override points).

Of course I agree it would be great to holistically refactor the printer to be self-consistent. Somebody's gotta do it though. Until then, we have an existing "codemod mode" with a correctness bug.

@ef4 ef4 force-pushed the printer-quote-bug branch 3 times, most recently from 60406be to a6292f4 Compare February 26, 2025 16:18
@ef4
Copy link
Contributor Author

ef4 commented Feb 26, 2025

New implementation is pushed and passing tests.

  • It no longer uses source loc info.
  • It only changes behavior when using entityEncoding: 'raw' mode, which is the only case that had this bug.
  • It opportunistically uses single quotes when that avoids need to replace double quotes with &quot;
  • It falls back to doing the full encoding that the default mode uses when we encounter both kinds of quotes in one attribute (the test I added demonstrates how an AST transform can cause this to happen)

@ef4 ef4 force-pushed the printer-quote-bug branch from a6292f4 to 656817c Compare February 26, 2025 16:33
@chancancode
Copy link
Contributor

Yea, seems good to me.

For the codemod mode, reading the comments here, my impression is that the intention is for the caller (whoever wires up the codemod mode upstream) to use this:

/**
* Used to override the mechanism of printing a given AST.Node.
*
* This will generally only be useful to source -> source codemods
* where you would like to specialize/override the way a given node is
* printed (e.g. you would like to preserve as much of the original
* formatting as possible).
*
* When the provided override returns undefined, the default built in printing
* will be done for the AST.Node.
*
* @param ast the ast node to be printed
* @param options the options specified during the print() invocation
*/
override?(ast: ASTv1.Node, options: PrinterOptions): void | string;

I don’t think that’s necessarily the end consumer, but maybe some code in here/some shared thing Robert made years ago?

In any case trying to avoid HTML escaping where possible seems like a reasonable goal even with the normalizing philosophy, especially when it’s reliable. So I’m good with this regardless.

@ef4
Copy link
Contributor Author

ef4 commented Feb 26, 2025

I'm aware of the override thing, it's really only useful for one very specific use case, which is when you want to return the cached old serialization of known nodes that haven't changed. I think the only user is https://github.com/ember-template-lint/ember-template-recast/blob/3da4525e530436ea3faec6ff9b402a0fadd2e900/src/index.ts#L15-L19

I'm not doing anything like that, I don't actually care about trying to get format preservation, I'm only using "codemod mode" to get the better entity handling (which I agree should probably just be part of the default behavior anyway).

@ef4 ef4 merged commit 3b3ca18 into main Feb 26, 2025
9 checks passed
@ef4 ef4 deleted the printer-quote-bug branch February 26, 2025 20:12
@github-actions github-actions bot mentioned this pull request Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants