Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null characters, should we sanitize? #60

Open
visch opened this issue Jan 11, 2023 · 1 comment
Open

Null characters, should we sanitize? #60

visch opened this issue Jan 11, 2023 · 1 comment

Comments

@visch
Copy link
Member

visch commented Jan 11, 2023

Right now you'll get something like ValueError: A string literal cannot contain NUL (0x00) characters or
sqlalchemy.exc.DataError: (psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence as postgres doesn't allow NUL characters see https://www.postgresql.org/docs/current/functions-string.html#:~:text=chr(0)%20is%20disallowed%20because%20text%20data%20types%20cannot%20store%20that%20character.

Should we sanitize the data ie something like data.replace("\u0000","") or leave the offending record?

@visch visch changed the title Null characters, should we santizie? Null characters, should we sanitize? Jan 11, 2023
@williamlfish
Copy link

running into this as well. think it would be nice to at least have the option too 😅

github-merge-queue bot pushed a commit that referenced this issue Dec 10, 2024
Null characters are currently passed as-is to Postgres despite being
unsupported.

If it is encountered, it causes the sink to fail as noted here:
#60 with an error
like `ValueError: A string literal cannot contain NUL (0x00)
characters.`

This PR introduces a new option called `sanitize_null_text_characters`
which enables sanitization of these characters.

---------

Co-authored-by: Edgar Ramírez Mondragón <[email protected]>
Co-authored-by: Edgar Ramírez-Mondragón <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants