Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isn't csv a text-based rather than binary(bytestring)-based format? #202

Open
tysonzero opened this issue Sep 16, 2021 · 2 comments
Open

Comments

@tysonzero
Copy link

The spec appears to only mention text, and the specific binary encoding / charset of that text seems out of scope.

Accordingly it seems to me as though cassava should generally be dealing with Text instead of ByteString, perhaps with a Data.Csv.Utf8 module for just directly treating ByteString values as encoded utf-8 text.

@jchia
Copy link

jchia commented Dec 21, 2022

An ASCII delimiter in the undecoded ByteString corresponds a delimiter in the corresponding UTF-8-decoded Text, so under UTF-8 encoding there is no problem with making a mistake with delimiters.

However, the user is forced to use UTF-8 if there are Text/ShortText/Char fields (cassava assumes UTF-8). If he wants to use another text encoding, he needs to use ByteString fields and do the ByteString-Text conversion separately. Alternatively, he can perform transcoding between UTF-8 and the other text encoding, using UTF-8-encoded ByteStrings when interfacing with cassava.

I have no idea about the performance characteristics of each alternative, though, including the proposed Data.Csv.Utf8.

@tysonzero
Copy link
Author

To be clear Data.Csv.Utf8 would just be the current implementation. The module name should make it clear that using utf8 for the ByteString arguments is safe, and that non-utf8 arguments should expect edge cases and require additional care.

I am also unsure of how the Data.Csv or Data.Csv.Text or whatever Text-based alternative would change performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants