Add flexibility to save downloaded queries in different file formats #29

shouryakhanna · 2023-03-10T13:42:02Z

Add functionality to save downloaded queries in different file
formats to avoid heavy csv files. How to handle additional header information will have to be sorted out.

emilyhunt · 2023-04-03T10:35:04Z

I'd really appreciate it if support for the parquet format could be added, since it's a much quicker format for large dataframes when saving and loading tables.

Maybe the information in the header could be stored in an accompanying metadata file or in the filename?

HDF files would allow for information to be stored in headers and would allow for smaller file sizes, but wouldn't be nearly as fast as using parquet. (see second link above)

mfouesneau · 2024-09-11T09:11:58Z

gaiaunlimited/src/gaiaunlimited/selectionfunctions/subsample.py

Line 191 in a6bbe61

def __init__(self, subsample_query, file_name, hplevel_and_binning):

You can add a keyword argument e.g. format='csv' and then use it in all the filenames transparently. The only place that needs more is for pandas calls. You will have to do a getatrr(df, r'to_{format}') to get the writing function (and similarly for reading)

agabrown self-assigned this Sep 11, 2024

alfredcas self-assigned this Sep 11, 2024

agabrown removed their assignment Sep 11, 2024

alfredcas added the enhancement New feature or request label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flexibility to save downloaded queries in different file formats #29

Add flexibility to save downloaded queries in different file formats #29

shouryakhanna commented Mar 10, 2023

emilyhunt commented Apr 3, 2023

mfouesneau commented Sep 11, 2024

Add flexibility to save downloaded queries in different file formats #29

Add flexibility to save downloaded queries in different file formats #29

Comments

shouryakhanna commented Mar 10, 2023

emilyhunt commented Apr 3, 2023

mfouesneau commented Sep 11, 2024