You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* FLOAT, DOUBLE - Signed comparison with special handling of NaNs and
signed zeros. The details are documented in the
[Thrift definition](src/main/thrift/parquet.thrift) in the
`ColumnOrder` union. They are summarized here but the Thrift definition
is considered authoritative:
* NaNs should not be written to min or max statistics fields.
* If the computed max value is zero (whether negative or positive),
`+0.0` should be written into the max statistics field.
* If the computed min value is zero (whether negative or positive),
`-0.0` should be written into the min statistics field.
For backwards compatibility when reading files:
* If the min is a NaN, it should be ignored.
* If the max is a NaN, it should be ignored.
* If the min is +0, the row group may contain -0 values as well.
* If the max is -0, the row group may contain +0 values as well.
* When looking for NaN values, min and max should be ignored.
Specifically the points about the computed max and min values when they are negative/positive zero.
I plan to take this on, maybe after #5003 is merged so can do f16, f32, f64 all at once
tustvold
changed the title
Parquet: writing zero to statistics should follow spec
Parquet: handle signed floating point zeros in statistics
Nov 7, 2023
Sigh... I wish parquet just used total ordering rather than this mess of special casing. If nothing else it makes actually using the statistics correctly very subtle
Describe the bug
https://github.com/apache/parquet-format/blob/46cc3a0647d301bb9579ca8dd2cc356caf2a72d2/README.md?plain=1#L162-L178
Specifically the points about the computed max and min values when they are negative/positive zero.
To Reproduce
Add test to parquet/src/column/writer/mod.rs:
Run:
Expected behavior
Test should succeed
Additional context
The text was updated successfully, but these errors were encountered: