Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for registering an ObjectStore for remote files #899

Open
mesejo opened this issue Oct 5, 2024 · 7 comments
Open

Remove the need for registering an ObjectStore for remote files #899

mesejo opened this issue Oct 5, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@mesejo
Copy link
Contributor

mesejo commented Oct 5, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, we need to register an ObjectStore to register remote files stored in S3, https, etc. This could be more ergonomic from a DX perspective.

Describe the solution you'd like
Automatically detect if an ObjectStore is needed, like in the datafusion-cli; see here.

Describe alternatives you've considered
keep it as it is now

Additional context
We should increase the test coverage for ObjectStore

@mesejo mesejo added the enhancement New feature or request label Oct 5, 2024
@ion-elgreco
Copy link
Contributor

Just like delta-rs, polars etc we should just take the objectstore from the uri and then parse the storage options either as parameter or env variables

@timsaucer
Copy link
Contributor

Related, though not yet on crates.io: https://github.com/developmentseed/object-store-rs

@kylebarron
Copy link
Contributor

Related, though not yet on crates.io: developmentseed/object-store-rs

I'd be happy to work with you to use this! Indeed this is the explicit goal of pyo3-object_store, to define the Python builders for ObjectStore once and then reuse them for multiple projects.

@robtandy
Copy link

@kylebarron What are your thoughts on how to approach this? I'm happy to try to address it and submit a PR for it.

I'd like the same thing for github.com/apache/datafusion-ray, as after the rewrite, I need to add support for object stores back in and I'd love it to be consistent with how it will work in datafusion-python.

@kylebarron
Copy link
Contributor

kylebarron commented Feb 26, 2025

I'd suggest to wait until the object_store 0.12 release (and, then, for datafusion to use that) (because I'm pinned to latest main of object_store from pyo3_object-store right now until that release). But you can see how I'm reusing pyo3_object_store here: https://github.com/developmentseed/async-tiff/pull/17/files

In essence, you can just re-export the store builders and then just accept PyObjectStore in whatever function that should interact with a store.

Then copy the type hints for the builders if desired. That exposes the full builder API as documented within obstore.store: https://developmentseed.org/obstore/latest/api/store/aws/ (so your Python API can supply all possible configuration settings to object_store)

@robtandy
Copy link

I'm not sure I follow exactly. Do we need to use python bindings to the object_store? If we are going to, in datafusion-ray and datafusion-python automatically register a store based on the url of the data files within register_listing_table won't the interaction with the store be in rust?

@kylebarron
Copy link
Contributor

To answer the original question of this issue: I think it's necessary to export classes to Python to customize object_store instances, because people often have custom config/authentication they need, and putting all config in env variables isn't sometimes possible.

But it's an API decision whether to require that the ObjectStore class is created manually. You can alternatively create one from the environment when one isn't passed in by hand.

But the point of pyo3_object_store is that you still get raw ObjectStore instances under the hood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants