Skip to content

Remove Arrow support from obspec.List #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kylebarron opened this issue May 8, 2025 · 0 comments · May be fixed by #14
Open

Remove Arrow support from obspec.List #13

kylebarron opened this issue May 8, 2025 · 0 comments · May be fixed by #14

Comments

@kylebarron
Copy link
Member

kylebarron commented May 8, 2025

Obstore supports returning Arrow RecordBatch from each chunk in obstore.list and returning an Arrow Table from obstore.list_with_delimiter.

I would like this to be an obstore implementation detail instead of a requirement of all obspec implementations.

I had hoped that I would be able to remove the return_arrow keyword from obspec's list methods but still allow obstore's implementation to add the return_arrow keyword as long as it defaults to False and returns a list[ObjectMeta] by default. However it looks like this doesn't pass pylance:

Image

See what I tried in #14

That said, since obspec's list is defined in terms of the Arrow PyCapsule Interface, setting return_arrow=True allows for very generic programming. The return type could be a pandas, Polars, DuckDB, pyarrow, nanoarrow, or arro3 or anything else that supports the protocol. (There is a wrinkle that list requires something that implements the ArrowArray interface, which I'm not sure pandas or polars define, since they only have a concept of a multiple-chunked data structure)

@kylebarron kylebarron linked a pull request May 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant