-
Notifications
You must be signed in to change notification settings - Fork 90
feat: support unknown_length
for virtual arrays in order to read without any materialization
#3475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ikrommyd
wants to merge
27
commits into
main
Choose a base branch
from
ikrommyd/unknown-length-virtualarray
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ng .length in recordarray
…of the Sentinel objects
… with VirtualArrays
pfackeldey
requested changes
Apr 22, 2025
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces
unknown_length
support for virtual arrays.Because multiple arrays can have the same shape (coming from the same offsets for example) and because the awkward codebase wants us to know shapes/length very often, the most consistent way to add
unknown_length
support is via introducing a separate shape generator for virtual arrays that returns a shape tuple when called. This is in order to be able to generate the shape of something without generating its data.We make the distinction between private
._shape
and._length
versus public.shape
and.length
properties.The public ones materialize the shape while the private ones don't. We extend this logic to the layouts and for the layouts that define a private
self._length
property as a function of the content, we instantiate that withunknown_length
in the virtual array case in the__init__
method and we actually calculate it the first time.length
of that layout is called in order to be able to instantiate layouts without materializing shapes. We also avoid materializing shapes for the__repr__
of the layouts.To make our life easier, we introduce two helper utils
maybe_shape_of
andmaybe_length_of
.Finally, we make the necessary changes in
from_buffers
in order to be able to to construct the proper data and shape generators to pass down to theVirtualArray
buffers.This has been tested through coffea as well with the ADL benchmarks, the coffea processors example and the AGC where we observe no materialization when reading with nanoevents and proper materialization of exactly the right buffers when running the analyses snippets.