Skip to content

Expose object_store for direct use #1008

Open
@matko

Description

@matko

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I need to be able to delete old resources generated by write_parquet() and similar methods, move them out of the way, or do other such operations that broadly fall in the category of 'data/artifact cleanup'. Datafusion doesn't directly implement such move/delete operations, so this requires a different library. Depending on what environment i'm operating in (local file system, S3, google bucket) this requires a slightly different setup and operation.

However, datafusion ships with ObjectStore, a generic frontend for many different object store systems, mainly S3 and its equivalents in other cloud environments, but also providing an api-compatible local file storage version of this. This is used to be able to read and write to such object stores from within datafusion. In datafusion-python, these ObjectStore objects are opaque handles that are only useful for registering with a session context. In rust however, these also allow the user to directly manipulate objects in these stores, fetch them, delete them, move them, etc.

Describe the solution you'd like
I would like the datafusion-python version of ObjectStore to not just be an opaque handle, but instead allow access to the underlying methods. this will allow me to generically implement operations on generated artifacts that are not doable in datafusion directly.

Describe alternatives you've considered
The workaround is to use an s3-compatible library directly. This doesn't help with local files though, which still requires a separate code path.

Another possibility is to have a separate python library wrapping the rust object_store crate, as arguably it's not the job of datafusion-python to provide a good API for this. However, it's useful to be able to define just one ObjectStore (like from a configuration) and use it both for datafusion and for related object-store operations like artifact cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions