-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading parquet from s3 #22
Comments
Thanks for writing @dsebban. Hit the star button if you like the project :) Now regarding the S3 (or any object store) store read. The support in polars-rust is extended for parquet format only and has to be specifically enabled via features. I have purposefully not done this at the moment because if things don't work for all the formats in a similar way then it leads to API inconsistency. The underlying crate One tricky way to get around this would be to read files as byte streams on scala/ java streams and then pass over JNI to the rust side of things. This passed stream can then be fed to the polars readers for all formats. The problem with this approach is that I haven't tried this before and I doubt this can be done in a performant way. If you have an other ideas on this or if you want to take a stab at this, PRs are most welcome. |
Thank you for you answer! I see your concerns about api uniformity between read and write. I don't think passing bytes through the JNI layer can be done in a performant manner too. I will dig in the code to understand a bit more about your statement about async and Polaris not working well together. Ideally you would like a Spark like API right ? df.read.options(..).path(s3/local json/csv/arrow) |
Hi, Thank you on this awesome work @chitralverma !
I am trying to read from s3 in scala . I can see that writing is pretty simple as you added utility function to pass options in
write_utils.rs
I am trying to do the same reading into a df, passing a s3 path to Polars.scan obviously throws
Is there something I need tweak in the config to be able to read directly from s3 ? Maybe we need a
read_utils.rs
. I would be able to contribute, if you have some guidance :)The text was updated successfully, but these errors were encountered: