Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add virtual arrays #1277

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft

feat: add virtual arrays #1277

wants to merge 11 commits into from

Conversation

pfackeldey
Copy link
Contributor

@pfackeldey pfackeldey commented Feb 18, 2025

marked as draft because:

If you want to try it with the corresponding awkward branch:

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"tests/samples/nano_dy.root": "Events"}, mode="virtual").events()

events.Muon.pt
# <Array [??, ??, ??, ??, ..., ??, ??, ??, ??] type='40 * var * float32[param...'>

events.Jet.eta
# <Array [??, ??, ??, ??, ..., ??, ??, ??, ??] type='40 * var * float32[param...'>

events.run
# <Array [??, ??, ??, ??, ..., ??, ??, ??, ??] type='40 * uint32[parameters={...'>

import awkward as ak

events.Muon.pt.layout
# <ListOffsetArray len='40'>
#     <offsets><Index dtype='int64' len='41'>[## ... ##]</Index></offsets>
#     <content><NumpyArray dtype='float32' len='##'>
#         <parameter name='__doc__'>'pt'</parameter>
#         [## ... ##]
#     </NumpyArray></content>
# </ListOffsetArray>

ak.materialize(events.Muon.pt).layout
# <ListOffsetArray len='40'>
#     <offsets><Index dtype='int64' len='41'>
#         [ 0  0  0  0  0  2  3  5  5  5  5  6  6  6  6  7  7  7  7  7  7  7  7
#           7  9 11 11 13 13 15 15 16 17 17 17 18 18 18 18 18 18]
#     </Index></offsets>
#     <content><NumpyArray dtype='float32' len='18'>
#         <parameter name='__doc__'>'pt'</parameter>
#         [76.75332   20.131409  31.038704  50.641342  14.330103  16.724983
#          13.908063  46.498886  40.667812  51.43576   39.59279   38.89543
#          33.712822  17.082792  14.526995   4.3605423 10.117709  17.949919 ]
#     </NumpyArray></content>
# </ListOffsetArray>

pfackeldey and others added 8 commits February 18, 2025 18:37
…nested collections first, this allows for more deeply nested objects to exists
…1278)

* ci: try pytest-xdist

* Update ci.yml

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client
@pfackeldey
Copy link
Contributor Author

I've deprecated delayed and added a new argument called mode. mode can be "eager" (loading all columns), "virtual" (creating a virtual array that loads on demand), or "dask" which creates a dask-awkward array. If mode is None (default), delayed=True will correspond to the "dask" case, and delayed=False to the "virtual" case. If mode is given, it will take precedence over delayed. I added a deprecation for delayed with an unknown deprecation date.

The only thing missing now is a new awkward version that contains virtual arrays. Otherwise, this PR is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants