This repository has been archived by the owner on Oct 10, 2022. It is now read-only.
New major release - radio / youtube / data quality distillation
Pre-release
Pre-release
TLDR:
- 855 GB (in
.wav
format inint16
) non archived; - (new!) A new domain - radio;
- (new!) A larger YouTube dataset with 1000+ additional hours;
- (new!) A small (300 hours) YouTube dataset downloaded in maximum quality;
- (new!) 18 hours in 3 validation sets for YouTube / books / public calls with ground truth annotations;
- See the distilled files with "bad" data in this issue;