Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flexibility to save downloaded queries in different file formats #29

Open
shouryakhanna opened this issue Mar 10, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@shouryakhanna
Copy link
Member

Add functionality to save downloaded queries in different file
formats to avoid heavy csv files. How to handle additional header information will have to be sorted out.

@emilyhunt
Copy link

I'd really appreciate it if support for the parquet format could be added, since it's a much quicker format for large dataframes when saving and loading tables.

Maybe the information in the header could be stored in an accompanying metadata file or in the filename?

HDF files would allow for information to be stored in headers and would allow for smaller file sizes, but wouldn't be nearly as fast as using parquet. (see second link above)

@agabrown agabrown self-assigned this Sep 11, 2024
@alfredcas alfredcas self-assigned this Sep 11, 2024
@agabrown agabrown removed their assignment Sep 11, 2024
@alfredcas alfredcas added the enhancement New feature or request label Sep 11, 2024
@mfouesneau
Copy link
Member

def __init__(self, subsample_query, file_name, hplevel_and_binning):

You can add a keyword argument e.g. format='csv' and then use it in all the filenames transparently. The only place that needs more is for pandas calls. You will have to do a getatrr(df, r'to_{format}') to get the writing function (and similarly for reading)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants