You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It raises ValueError: Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): ('arrow', {}), NamedSplit('test'): ('json', {})}.
I believe this bug is caused by the logic that tries to infer dataset format. It counts the most common file extension. However, a small dataset can fit in a single .arrow file and have two JSON metadata files, causing the format to be inferred as JSON:
Describe the bug
This code fails to load the dataset it just saved:
It raises
ValueError: Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): ('arrow', {}), NamedSplit('test'): ('json', {})}
.I believe this bug is caused by the logic that tries to infer dataset format. It counts the most common file extension. However, a small dataset can fit in a single
.arrow
file and have two JSON metadata files, causing the format to be inferred as JSON:Steps to reproduce the bug
Execute the code above.
Expected behavior
The dataset is loaded successfully.
Environment info
datasets
version: 2.20.0huggingface_hub
version: 0.23.4fsspec
version: 2024.5.0The text was updated successfully, but these errors were encountered: