-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT #4183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I think this could do with some end-to-end test coverage, perhaps extending test_primitive_single_column_reader_test
or something.
It would also be really nice to get an integration test, ideally with a file generated by pyarrow or something, not sure if parquet-testing has such a file yet - might be something to contribute there?
@@ -1882,6 +1935,7 @@ mod tests { | |||
encoder.put(&v[..]).expect("ok to encode"); | |||
} | |||
let bytes = encoder.flush_buffer().expect("ok to flush buffer"); | |||
println!("{:x?}", bytes.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Marking as draft as waiting for feedback, feel free to mark as ready for review when you would like me to take another look |
Thank you, that's a pure oversight - filed #5051 FWIW I believe this is the only one that isn't supported. |
I can confirm I tried all the one described that could be applied to integers and floating points in https://parquet.apache.org/docs/file-format/data-pages/encodings/ and the only one not working with the rust implementation is I've never tried the encodings for BYTE_ARRAY type. |
PR #5053 |
@tustvold What's holding this PR up? I'm also encountering the issue that byte_stream_split is unsupported. I'm willing to make a PR of my own to do this if the problem is that @simonvandel is unresponsive. |
Always happy to review PRs, IIRC the major thing this PR was missing was adequate test coverage |
@mwlon my need for this encoding disappeared, and so did my motivation to finish it. If you want to continue, feel free to do so |
I've created a parquet-testing PR to facilitate this: apache/parquet-testing#45 |
parquet-testing PR is in; new PR for BYTE_STREAM_SPLIT implementation: #5293 |
Closed by #5293 |
Which issue does this PR close?
Closes #4102.
Rationale for this change
What changes are included in this PR?
Implements decoding and encoding of BYTE_STREAM_SPLIT for f32 and f64.
Are there any user-facing changes?
Yes, now the BYTE_STREAM_SPLIT will not error when used as an encoding for a column.