Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow-ipc: Default to not preserving dict IDs #6788

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Nov 25, 2024

Which issue does this PR close?

Related to #5981

Rationale for this change

This is the first step towards removing the dict_id field as discussed in #5981. With this patch the default behavior changes to what the behavior will be once the field is fully removed.

The previous behavior can still be restored by passing with_preserve_dict_id(true), however, doing so is now deprecated and will be removed together with the dict_id in the next (March) DataFusion release.

What changes are included in this PR?

Default to not preserving the dict ID from the schema field dict_id.

Are there any user-facing changes?

Not a breaking change to an API, but the default behavior changes.

@tustvold @alamb

@tustvold
Copy link
Contributor

The integration tests seem to be legitimately failing on this

@brancz
Copy link
Contributor Author

brancz commented Nov 25, 2024

Is this the appropriate docs page to look at for trying to reproduce this locally? https://github.com/apache/arrow-rs/tree/main/arrow-integration-testing

@brancz
Copy link
Contributor Author

brancz commented Nov 26, 2024

So diving in, it looks like ipc, and c-data work fine, it's just flight. And surprisingly even rust-to-rust seems broken, which is what I'm going to start with, by adding more tests to arrow-flight.

Previously the integration tests forced preserving dict IDs in some
places and used the default in others. This worked fine previously
because preserving dict IDs used to be the default, but it isn't
anymore.
@brancz
Copy link
Contributor Author

brancz commented Nov 26, 2024

@tustvold @alamb tests are now green!

@tustvold tustvold added the api-change Changes to the arrow API label Nov 27, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, I've labelled it an API change so it is rendered as a breaking change in the changelog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants