Skip to content

Virtual Array TODOs #1308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks
ikrommyd opened this issue Apr 8, 2025 · 3 comments
Open
3 tasks

Virtual Array TODOs #1308

ikrommyd opened this issue Apr 8, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@ikrommyd
Copy link
Collaborator

ikrommyd commented Apr 8, 2025

With #1277 merged, coffea does support opening a file using virtual arrays. With scikit-hep/awkward#3475, it finally supports opening a file without needing to know the buffer shapes.
This is an issue just to keep track of the things that are pending to fully support virtual arrays well.

  • Bring back the coffea 0.7-like executors.
  • Investigate whether coffea should cache the original uncut events in the @original_array attribute because since they are virtual, the memory overhead is only the python objects and not real arrays. It's probably good to keep them there to avoid doing from_buffers when we need the original events.
  • Can we do something about objects that are really regular arrays like LHEPdfWeights but uproot deserializes them as list offset arrays? That sounds like a useless offsets calculation to me but there may not be another way.
@ikrommyd
Copy link
Collaborator Author

ikrommyd commented Apr 8, 2025

@lgray @pfackeldey please edit if you think I forgot something.

@ikrommyd ikrommyd added the enhancement New feature or request label Apr 8, 2025
@ikrommyd
Copy link
Collaborator Author

ikrommyd commented Apr 8, 2025

After having a chat with Peter, I rearranged them in order of priority in my opinion. The last thing:

Investigate the usage of coffea's caches to store deserialized offsets branches when opening up the file to avoid deserializing the same offsets more than one times (some objects might share offsets).

we're probably not gonna do because everything else should have solved the problem so there is no reason to add an extra solution to the same problem.

@ikrommyd
Copy link
Collaborator Author

Well scikit-hep/awkward#3475 reduces this issue by a lot so I edited it accordingly 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant