Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Fix remaining issues with deferred columns in RNTupleMerger #17810

Merged
merged 9 commits into from
Mar 3, 2025

Conversation

silverweed
Copy link
Contributor

This Pull request:

fixes all remaining known cases of mishandling of deferred columns in the RNTupleMerger, most notably the inability of merging a file with a deferred column in the first source and mishandling of sources with a deferred column unaligned with cluster boundaries.
Several tests are added to cover those cases.

Optimizations left to do

  • properly compress synthesized zero pages
  • don't generate leading zero pages for columns when not needed (use the deferred column machinery instead).

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link

github-actions bot commented Feb 24, 2025

Test Results

    18 files      18 suites   4d 9h 35m 31s ⏱️
 2 719 tests  2 719 ✅ 0 💤 0 ❌
47 253 runs  47 253 ✅ 0 💤 0 ❌

Results for commit e60bf72.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a few comments below.

If the compression is the expected one we don't need to check for
uncompressible pages
@silverweed silverweed force-pushed the ntuple_merge_deferred3 branch 2 times, most recently from 90ce31a to f4bf5e1 Compare February 27, 2025 14:29
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, please have a look at the comment about empty clusters

Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a very minor documentation suggestion for your consideration!

@silverweed silverweed force-pushed the ntuple_merge_deferred3 branch from f4bf5e1 to 8beb904 Compare February 28, 2025 14:47
Adding the possibility of not copying the clustering when initting a
sink from a descriptor. This is used in the Merger to init the first
source in the non-incremental merging case.
Fixes (at least) the following cases:
- deferred column in the first source
- deferred column unaligned with a cluster boundary in any source
Similarly to what we're doing for deferred columns and late-model
extended fields, we only write those in the footer and never in the
non-extended header. Otherwise we get in trouble when incrementally
merging a RNTuple, as it will end up with duplicate ExtraTypeInfo
@silverweed silverweed force-pushed the ntuple_merge_deferred3 branch from 8beb904 to e60bf72 Compare February 28, 2025 14:49
@silverweed silverweed merged commit 3c55661 into root-project:master Mar 3, 2025
21 checks passed
@silverweed silverweed deleted the ntuple_merge_deferred3 branch March 3, 2025 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants