Skip to content

8354273: Restore even more pointless unicode characters to ASCII #24567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

magicus
Copy link
Member

@magicus magicus commented Apr 10, 2025

As a follow-up to JDK-8354213, I found some additional places where unicode characters are unnecessarily used instead of pure ASCII.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8354273: Restore even more pointless unicode characters to ASCII (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24567/head:pull/24567
$ git checkout pull/24567

Update a local copy of the PR:
$ git checkout pull/24567
$ git pull https://git.openjdk.org/jdk.git pull/24567/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24567

View PR using the GUI difftool:
$ git pr show -t 24567

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24567.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 10, 2025

👋 Welcome back ihse! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 10, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 10, 2025
@openjdk
Copy link

openjdk bot commented Apr 10, 2025

@magicus The following labels will be automatically applied to this pull request:

  • client
  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Apr 10, 2025

Webrevs

@@ -26,7 +26,7 @@ modifications:
[$year-of-document] World Wide Web Consortium.
https://www.w3.org/copyright/software-license-2023/"

Disclaimers §anchor
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incorrectly copied piece of html; compare how the very same license is handled in e.g. src/java.xml/share/legal/schema10part1.md. The § is the non-ascii character that triggered my detection of this, but the entire "anchor" string is incorrect here.

@@ -47,7 +47,7 @@ The notice is:
"Copyright © 2023 W3C®. This software or document includes material copied from
or derived from [title and URI of the W3C document]."

Disclaimers §anchor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that come from an upstream file ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is copy/pasted from a textual rendering of the html file specified in the URL above. This is what you get if you naïvely select the text in Firefox and press Ctrl-C. The §anchor part is not rendered on screen.

@@ -189,7 +189,7 @@ private static String toUniformString(double value) {
int DIGIT_COUNT = 40;
String str = decimal.toPlainString();
if (str.length() >= DIGIT_COUNT) {
str = str.substring(0,DIGIT_COUNT-1)+"";
str = str.substring(0,DIGIT_COUNT-1)+"...";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test this ? Please say more than tiers 1-3 .. because this test isn't run until tier4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not test tier4. Will do so now. Thanks!

@eirbjo
Copy link
Contributor

eirbjo commented Apr 18, 2025

While the changes here look okay, I think the issue/PR title could be improved.

The replacement of Unicode "En Dash" with ASCII hypen-minus and the similar relacement of the Unicode "Horizontal Ellipsis" with three ASCII periods are not really "restoring" much, and these unicode characters are hardly "pointless" as they may carry different semantic meaning, behavior and rendering.

It's a valid chioce to normalize them into ASCII though, but perhaps a title like "Replace even more Unicode characters with ASCII" would be more "fair" to these poor Unicode characters :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants