You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Information on how to locate all the files belonging to a certain subdataset is important for the DAX API and how it handles loading in subdatasets. Note, this is different from a subdataset's format, which is simply the file format of the subdataset.
Examples of subdataset types:
a simple file = it's path name (e.g. txt, csv)
a directory (e.g. a directory of image files, a directory of subdirectories)
a list of files (e.g. a txt with paths to all files in the validation set)
a regex (e.g. all train files have train_ appended at the start of the filename)
We need to determine what subdataset types there are and how to include this information. The current proposal is this:
Simple file:
- file_name: noaa-weather-data-jfk-airport/jfk_weather.csv...format: CSV # rename from type to format...type: path_namevalue: noaa-weather-data-jfk-airport/jfk_weather.csv
How about using a pattern label that serves as an umbrella. Technically everything you have listed is a pattern, which might or might not include a wildcard.
- pattern: /path/to/dir/file.csv
...
format: CSV
...
type: file
- pattern: /path/to/dir/*.csv
...
format: CSV
...
type: regex
- pattern: /path/to/dir/*
...
format:
...
type: regex
- pattern: /path/to/dir/file.txt
...
format: CSV
...
type: listing # CSV file containing one column only, which has a special meaning
Makes sense. Semantically it might make more sense if we move type under pattern instead of at the same level (e.g., we may have more fields to describe the pattern itself in the future).
Information on how to locate all the files belonging to a certain subdataset is important for the DAX API and how it handles loading in subdatasets. Note, this is different from a subdataset's format, which is simply the file format of the subdataset.
Examples of subdataset types:
train_
appended at the start of the filename)We need to determine what subdataset types there are and how to include this information. The current proposal is this:
Simple file:
Regex:
List of files:
There's probably a better way of structuring this that avoids the
file_name
being the same as thevalue
field in some cases, but it's a start.The text was updated successfully, but these errors were encountered: