-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whitespace in pasted HTML is not handled according to the HTML spec #2713
Comments
/bounty 150 |
|
/attempt #2713 Options |
1 similar comment
/attempt #2713 Options |
Note: The user @AayushMohan is already attempting to complete issue #2713 and claim the bounty. If you attempt to complete the same issue, there is a chance that @AayushMohan will complete the issue first, and be awarded the bounty. We recommend discussing with @AayushMohan and potentially collaborating on the same solution versus creating an alternate solution. |
To anyone wondering where in the code to implement the solution: Proper HTML parsing as per the spec should be part of the |
I encourage anyone working on this to join us on Discord and share your thoughts in the #contributing channel. Better to collaborate on one good solution than everyone racing to complete one hastily-built solution, right? 🙂 |
I found some more detailed inforamtion about how HTML parsers should handle whitespace, in case it's helpful to anyone: |
/attempt #2713 Options |
Discussion on Discord is here: https://discord.com/channels/1026227597115396188/1166709783865331723 |
@rishi-raj-jain You can run those tests like this: yarn jest packages/core/src/plugins/html-deserializer/utils/deserializeHtml.spec.tsx |
Discussion on potential approaches: https://github.com/udecode/plate/pull/2715/files#r1372355759 |
For testing purposes, here's an example of an input and its equivalent collapsed HTML. Both of these HTML strings should result in identical Slate values. Original: <p>
Hello world
</p>
<p>
one two
three
</p>
<pre>
hello one two
three
four
</pre>
<div style="white-space: pre">
hello one two
three
four
</div>
<div style="white-space: pre-line">
hello one two
three
four
</div> Collapsed: <p>Hello world</p><p>one two three</p><pre>hello one two
three
four
</pre><div style="white-space: pre">
hello one two
three
four
</div><div style="white-space: pre-line">
hello one two
three
four
</div> |
Just to let everyone know, it looks like there are a lot of people working on this. I'm going to leave the bounty open for a few days after the first mergable PR is opened, and I'm going to award the bounty to whichever PR presents the highest quality solution (completeness, testing, general code quality), regardless of who submits first. If you want to avoid wasting your time, I suggest collaborating on a single solution like the Algora comment recommends. |
/attempt #2713 Options |
As I said here, I'll leave this bounty open until Thursday to give anyone else a chance to submit a PR. Please do not copy from other contributors' PRs without their permission. This will be treated as a copyright violation. |
/attempt #2713 Options |
@jkcs and I have discovered some more details about how browsers handle leading and trailing whitespace in |
Thanks everyone for your time. We'll no longer be accepting new PRs for this issue. The bounty will be awarded to @jkcs once the last few issues are resolved. |
💡 @jkcs submitted a pull request that claims the bounty. You can visit your org dashboard to reward. |
@jkcs: Your claim has been rewarded! We'll notify you once it is processed. |
🎉🎈 @jkcs has been awarded $150! 🎈🎊 |
Note to anyone attempting the bounty: Please read the Expected Behaviour section carefully and write tests to check that all cases are implemented correctly. Discuss it with us via this issue or Discord if anything is unclear.
Deadline: Thursday 2 November 2023
Clarifications:
Description
When pasting HTML into Plate via the deserialize HTML plugin, the pasted HTML is parsed inside
getFragment
usingparseHtmlDocument
before being passed as a HTML element todeserializeHtml
. As a result of the logic insidedeserializeHtml
, thestripWhitespace
option is ignored when the given argument is a HTML element instead of a string, hence whitespace is not stripped from pasted HTML.Note that this is the intended behaviour in some circumstances, such as
<pre>
tags, but not all. See Expected Behaviour.Context
When copying HTML from Firefox, Firefox inserts additional line feed characters (
\n
) at regular intervals. For example, the same paragraph with no newlines becomes the following when opened in and copied from Firefox:This bug results in those same line feed characters appearing inside text nodes when pasting into Plate. While this browser quirk is specific to Firefox, the HTML pasting bug in Plate applies to all browsers.
Steps to Reproduce
test.html
Expected Behavior
HTML should be parsed as per the HTML spec:
Additionally, this behaviour should be modifiable using the
white-space
CSS property, regardless of whether this property is included explicitly using astyle
prop or implicitly though default browser styles. (The<pre>
element applies an implicitwhite-space: pre
style.)See MDN's docs for a complete description of each possible value
Finally, as per the HTML spec, 4.4.3 The pre element:"In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped."
Environment
Bounty
Click here to add a bounty via Algora.
Funding
The text was updated successfully, but these errors were encountered: