[Feature] Automate DataLoader as a default option #168

GaoxiangLuo · 2022-07-06T18:18:36Z

Is your feature request related to a problem? Please describe.
As the configuration file of dataset has an unused tag dataFormat, it'll be convenient to automate basic data loading after downloading data from the url as default while users can still customize their personalized data loading. This will be useful when it comes to the production scale and doesn't require very fine-detailed and specific pre-processing, so they can use the default option.

Describe the solution you'd like
If a dataset is npy format, it will read the numpy array only. If a dataset is npz format, it will read the numpy arrays as a dictionary with headers as keys, and arrays as values. If a dataset is csv or stata format, it will read it as a pandas DataFrame. If a dataset is zip format, it will unzip it. If a dataset is a python pickle format, it will load the content from it. If a dataset consists of images for a classification task, it will construct the dataset by using folder names as their labels. This requires the users put the data into the right sub-folders.

Describe alternatives you've considered
A data loader is an essential competent of a ML task. Besides the default option the system provides, users can bypass the default option and overwrite it with their customized data loaders.

The text was updated successfully, but these errors were encountered:

GaoxiangLuo added the enhancement New feature or request label Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Automate DataLoader as a default option #168

[Feature] Automate DataLoader as a default option #168

GaoxiangLuo commented Jul 6, 2022 •

edited

Loading

[Feature] Automate DataLoader as a default option #168

[Feature] Automate DataLoader as a default option #168

Comments

GaoxiangLuo commented Jul 6, 2022 • edited Loading

GaoxiangLuo commented Jul 6, 2022 •

edited

Loading