Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multisearch response unmarshalling loses nano precision from numeric timestamps #118

Open
ddelemeny opened this issue Apr 10, 2024 · 1 comment · May be fixed by #122
Open

Multisearch response unmarshalling loses nano precision from numeric timestamps #118

ddelemeny opened this issue Apr 10, 2024 · 1 comment · May be fixed by #122

Comments

@ddelemeny
Copy link
Contributor

This issue proves complex to tackle, better keep a description of things here .

The problem :

// TODO: understand why we get a float64?

When creating a data frame from the response to a search query, the response parser finds numeric timestamp values cast as float64. This cast is not wanted, as float64 can't represent nanosecond precision timestamps.

Where does it happen

err = dec.Decode(&msr)

The json response from the API gets unmarshalled early in the processing pipeline. The json decoder used in ExecuteMultisearch casts number to float64 by default

The solution :

The decoder can be told to unmarshal numbers to a "polymorphic" type : json.Number, which allows further processing to decide the actual type of the specific datum down the line.

The other problem :

Changing from a simple type to a polymorphic one in a complex multi-branched processing pipeline tends to break a lot of stuff.
Reworking the whole call-tree of a Multisearch to handle json.Number instead of float64 numbers is an absurd chore.

The more complete solution ?

A reasonable approach would be to perform a shallow unmarshal early in the process, only to be able to dispatch responses to sub-handlers. Then do a second unmarshal of the response in subhandlers, when it's appropriate to decide if we want the polymorphic type (when timestamps are used). response_parser.go:parseResponse looks like a good candidate

@ddelemeny
Copy link
Contributor Author

ddelemeny commented May 15, 2024

sort key seems to suffer from precision loss too, resulting in a possible shuffled ascending order : the dataframe's nanos field received from the backend has incorrect values.

Need more love.

Edit : dataframe nanos don't come from the sort key but from the timestamp.
Available timestamp output formats currently do not represent nanos accurately in Quickwit, this needs to be addressed first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant