This repository has been archived by the owner on Oct 10, 2022. It is now read-only.
Releases: snakers4/open_stt
Releases · snakers4/open_stt
First release
First release
Yeah, we are building the largest open STT dataset for the Russian language)
Because we beilieve that life is not a zero-sum game.
This release mostly consists of our attempts to:
- See what is available;
- Gather and document work done before us in one place;
Historical composition and downloads
(Old dowload links will be discarded every iteration or two)
Type | Utterances | Hours | GB | Speaker sets | Characters | Mean length, seconds | Mean chars |
---|---|---|---|---|---|---|---|
Lecture | 6,803 | 6.3 | 1.9 | 29 | 316,953 | 3.36 | 46.6 |
Narration | 67,052 | 80.3 | 27.5 | 584 | 3,075,827 | 4.31 | 45.9 |
Phone_calls | 233,868 | 211.2 | 45.9 | 8175 | 6,706,717 | 3.25 | 28.7 |
Series | 20,243 | 17.5 | 5.2 | 51 | 759,433 | 3.10 | 37.5 |
Total | 327,966 | 315 | 80 | 8,839 | 10,858,930 |
- Dowload the chunks:
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01.tar.gz_aa
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01.tar.gz_ab
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01.tar.gz_ac
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01.tar.gz_ad
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01.tar.gz_ae
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01.tar.gz_af
For multi-threaded downloads use aria2 with -x
flag.
- Download the meta data:
wget https://ru-open-stt-v01.ams3.digitaloceanspaces.com/ru_open_stt_v01_public.csv
-
Put the chunks together:
cat ru_open_stt_v01.tar.gz_* > ru_open_stt_v01.tar.gz
-
Unpack