-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Fix remaining issues with deferred columns in RNTupleMerger #17810
[ntuple] Fix remaining issues with deferred columns in RNTupleMerger #17810
Conversation
e39c070
to
b73b5bf
Compare
Test Results 18 files 18 suites 4d 9h 35m 31s ⏱️ Results for commit e60bf72. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left a few comments below.
If the compression is the expected one we don't need to check for uncompressible pages
90ce31a
to
f4bf5e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me, please have a look at the comment about empty clusters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a very minor documentation suggestion for your consideration!
f4bf5e1
to
8beb904
Compare
Adding the possibility of not copying the clustering when initting a sink from a descriptor. This is used in the Merger to init the first source in the non-incremental merging case.
Fixes (at least) the following cases: - deferred column in the first source - deferred column unaligned with a cluster boundary in any source
Similarly to what we're doing for deferred columns and late-model extended fields, we only write those in the footer and never in the non-extended header. Otherwise we get in trouble when incrementally merging a RNTuple, as it will end up with duplicate ExtraTypeInfo
8beb904
to
e60bf72
Compare
This Pull request:
fixes all remaining known cases of mishandling of deferred columns in the RNTupleMerger, most notably the inability of merging a file with a deferred column in the first source and mishandling of sources with a deferred column unaligned with cluster boundaries.
Several tests are added to cover those cases.
Optimizations left to do
Checklist: