-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for commas in field names #1896
base: main
Are you sure you want to change the base?
Conversation
I'm wondering, would it be easier if we just HTTP encoded the data when concatenating it like this? Do we display concatenated information anywhere? If so, maybe we just HTTP encode the fields before concatenating them for the field setting? |
I had thought about base64 encoding them but figured it would be nice to keep them readable. HTTP encoding would be a good balance. I'll see if there are any problems with that. |
That works nicely for my tests, and is a good bit less code. The only downside is that it runs on every call to tokenize() and concatenate() rather than only calls on strings with the delimiter. I'm not sure if we'll run into any cases where that will hurt us (for example things getting double-http-encoded or something like that). |
Actually the http encoding wound up making me a little too nervous. It broke one unit test (HttpEncodingToolsTest). But worse, it does something that I don't think most people would expect of methods called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pick this back up? Should probably avoid letting it fall between the cracks.
boolean inQuotedToken = false; | ||
for (char character : string.toCharArray()) { | ||
if (character == '\"') { | ||
inQuotedToken = !inQuotedToken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think just going in an encoding/decoding the fields where needed would be simpler. For instance, this logic now breaks if we include quotation marks in the field names. Granted, that's probably even more unlikely to happen than commas, but if we're talking about respecting the tokenize method contracts, it's still broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to come back to this when I have a little time to get into it. I don't remember exactly what I meant by
it does something that I don't think most people would expect of methods called tokenize and concatenate
But I remember it being bad enough that it seemed like a deal-killer for encoding/decoding the fields.
This commit adds support for commas in field names, which is allowed by Elasticsearch.
Closes #1380