Skip to content

Conversation

mattermoran
Copy link

Fixes an issue where multibyte utf8 characters like ö or emojis would get corrupted if their byte sequence was split across multiple stream chunks.

Previously the XmlTransformStream used textDecoder.decode(chunk) per chunk. If a multibyte char was split across chunks the decoder would treat an incomplete sequence as invalid and replace it with a replacement char U+FFFD (�)

The fix is to call textDecoder.decode(chunk, { stream: true }) which tells the decoder to buffer incomplete sequences between chunks.
The final textDecoder.decode() in flush emits any buffered sequences

Before: ö��ö
After: ööö

links:

image Arc 2025-08-26 10 22 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant