Skip to content
This repository has been archived by the owner on Oct 10, 2022. It is now read-only.

New major release - radio / youtube / data quality distillation

Pre-release
Pre-release
Compare
Choose a tag to compare
@snakers4 snakers4 released this 02 Jul 06:33
· 41 commits to master since this release

TLDR:

  • 855 GB (in .wav format in int16) non archived;
  • (new!) A new domain - radio;
  • (new!) A larger YouTube dataset with 1000+ additional hours;
  • (new!) A small (300 hours) YouTube dataset downloaded in maximum quality;
  • (new!) 18 hours in 3 validation sets for YouTube / books / public calls with ground truth annotations;
  • See the distilled files with "bad" data in this issue;