You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to download only the 'validation' split of my dataset; instead hit the error datasets.exceptions.ExpectedMoreSplitsError.
Appears to be the same undesired behavior as reported in #6939, but with data_files, not data_dir.
Here is the Traceback:
Traceback (most recent call last):
File "/home/user/app/app.py", line 12, in <module>
ds = load_dataset('datacomp/imagenet-1k-random0.0', token=GATED_IMAGENET, data_files={'validation': 'data/val*'}, split='validation', trust_remote_code=True)
File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 2154, in load_dataset
builder_instance.download_and_prepare(
File "/usr/local/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.10/site-packages/datasets/builder.py", line 1018, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File "/usr/local/lib/python3.10/site-packages/datasets/utils/info_utils.py", line 68, in verify_splits
raise ExpectedMoreSplitsError(str(set(expected_splits) - set(recorded_splits)))
datasets.exceptions.ExpectedMoreSplitsError: {'train', 'test'}
Note: I am using the data_files argument only because I am trying to specify that I only want the 'validation' split, and the whole dataset will be downloaded even when the split='validation' argument is specified, unless you also specify data_files, as described here: https://discuss.huggingface.co/t/how-can-i-download-a-specific-split-of-a-dataset/79027
Describe the bug
Trying to download only the 'validation' split of my dataset; instead hit the error
datasets.exceptions.ExpectedMoreSplitsError
.Appears to be the same undesired behavior as reported in #6939, but with
data_files
, notdata_dir
.Here is the Traceback:
Note: I am using the
data_files
argument only because I am trying to specify that I only want the 'validation' split, and the whole dataset will be downloaded even when thesplit='validation'
argument is specified, unless you also specifydata_files
, as described here: https://discuss.huggingface.co/t/how-can-i-download-a-specific-split-of-a-dataset/79027Steps to reproduce the bug
ds = load_dataset('datacomp/imagenet-1k-random0.0', token=GATED_IMAGENET, data_files={'validation': 'data/val*'}, split='validation', trust_remote_code=True)
Expected behavior
Downloading validation split.
Environment info
Default environment for creating a new Space. Relevant to this bug, that is:
The text was updated successfully, but these errors were encountered: