fix: preserve utf8 multibyte characters split across chunks in XmlTransformStream #3

mattermoran · 2025-08-26T20:25:26Z

Fixes an issue where multibyte utf8 characters like ö or emojis would get corrupted if their byte sequence was split across multiple stream chunks.

Previously the XmlTransformStream used textDecoder.decode(chunk) per chunk. If a multibyte char was split across chunks the decoder would treat an incomplete sequence as invalid and replace it with a replacement char U+FFFD (�)

The fix is to call textDecoder.decode(chunk, { stream: true }) which tells the decoder to buffer incomplete sequences between chunks.
The final textDecoder.decode() in flush emits any buffered sequences

Before: ö��ö
After: ööö

links:

…nsformStream

fix: preserve utf8 multibyte characters split across chunks in XmlTra…

bb493a0

…nsformStream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve utf8 multibyte characters split across chunks in XmlTransformStream #3

fix: preserve utf8 multibyte characters split across chunks in XmlTransformStream #3

Uh oh!

mattermoran commented Aug 26, 2025

Uh oh!

Uh oh!

fix: preserve utf8 multibyte characters split across chunks in XmlTransformStream #3

Are you sure you want to change the base?

fix: preserve utf8 multibyte characters split across chunks in XmlTransformStream #3

Uh oh!

Conversation

mattermoran commented Aug 26, 2025

Uh oh!

Uh oh!