Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: For multi-byte characters like Chinese, some output encodings can cause incorrect text rendering. #18242

Open
abgox opened this issue Nov 24, 2024 · 3 comments
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting

Comments

@abgox
Copy link

abgox commented Nov 24, 2024

Windows Terminal version

1.21.3231.0

Windows build number

10.0.22635.0

Other Software

PowerShell

Steps to reproduce

  • I cleared the contents of $profile,and added a test function in $profile.

    function test-render() {
        $text = @("你好你好你好", "😄😎🤔")
    
        $buffer = $Host.UI.RawUI.NewBufferCellArray($text, 'Cyan', 'Black')
        $Host.UI.RawUI.SetBufferContents($Host.UI.RawUI.CursorPosition, $buffer)
    
        $null = $host.UI.RawUI.ReadKey() # Suspend the process for easy observation
    }
  • There are two text rendering issues here.


  1. Emoji does not render properly in Windows Terminal Preview, but it works fine in ohter terminals like Windows Terminal,Tabby,Hyper.

Note

I'm in China, so the encoding of Windows Terminal (Preview) is automatically changed to GB2312.

But ohter terminals like Tabby and Hyper are using UTF8 encoding.

Image

Image


  1. When the output encoding is switched to UTF8, Windows Terminal (Preview) has unexpected behavior in rendering Chinese or other multi-byte characters, but it works fine in other terminals like Tabby,Hyper.

Note

other terminals like Tabby,Hyper works fine because they always use UTF8 encoding.

Image

Image

  • I switched to UTF8 with this regional setting.

    Image

Expected Behavior

  1. Emoji can render properly in Windows Terminal Preview.
  2. For multi-byte characters like Chinese, render it correctly and should not add spaces by mistake.

Actual Behavior

  1. Emoji can't render properly in Windows Terminal Preview.
  2. For multi-byte characters like Chinese, spaces are added incorrectly.
@abgox abgox added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Nov 24, 2024
@lhecker
Copy link
Member

lhecker commented Nov 26, 2024

Unfortunately, it was never specified whether the BufferCell type supports "surrogate pairs" or not (which is what your 3 emojis use). It actually never supported them properly and it simply worked for SetBufferContents coincidentally, because there was no input validation. You could write anything into the text buffer, even completely bogus codepoints and it would just work. Now we validate all inputs and so this doesn't work anymore. BufferCell now only supports UCS2, which is all it ever properly supported.

The only APIs that support writing Unicode to the console are WriteConsoleW, as well as WriteFile and WriteConsoleA with SetConsoleOutputCP(CP_UTF8).

You can read more about our breaking changes here and the reason for doing them: https://github.com/microsoft/terminal/wiki/Console:-Potential-Breaking-Changes
The one that affects you is the first bullet point (CHAR_INFO). Specifically, it's this PR that (intentionally) broke your code: #13321

I apologize for the issues that this has caused for you. Please let me know if you have any questions!

@abgox
Copy link
Author

abgox commented Nov 27, 2024

  1. When the output encoding is switched to UTF8, Windows Terminal (Preview) has unexpected behavior in rendering Chinese or other multi-byte characters, but it works fine in other terminals like Tabby,Hyper.
  • So for the second question, is there any relevant reason or solution?

@lhecker
Copy link
Member

lhecker commented Nov 27, 2024

We don't just maintain Windows Terminal but also all other parts of the console subsystem of Windows. One such component is "ConPTY" which is a translation layer from traditional console APIs like SetConsoleCursorPosition (= $Host.UI.RawUI.CursorPosition) to more modern VT sequences (= "`e[${y};${x}H"). This translation layer is used by Tabby and Hyper and also used by Windows Terminal.

The difference now is that Windows Terminal always bundles the latest version of ConPTY, while Tabby and Hyper use whatever version Windows comes with (which may be a few years behind). If you update to Windows 11 24H2 (build 26100) your Windows should have a version of ConPTY that performs input validation and then Tabby/Hyper will show the same issue.

Edit: If it's any consolidation, the usage of SetBufferContents already didn't work for most Emojis, even before this breaking change, due to zero width joiners. 🧑🏻‍❤️‍🧑🏼 for instance is 12 characters long but only occupies 2 cells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting
Projects
None yet
Development

No branches or pull requests

2 participants