Skip to content
This repository has been archived by the owner on Oct 10, 2022. It is now read-only.

Commit

Permalink
Add torrent for v0.5-beta
Browse files Browse the repository at this point in the history
  • Loading branch information
snakers4 committed Jul 2, 2019
1 parent 34e9028 commit 9ca61e4
Showing 1 changed file with 20 additions and 21 deletions.
41 changes: 20 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Let's make STT in Russian (and more) as open and available as CV models.
**Planned releases:**
- 1000-10,000 additional hours of books;
- Data quality distillation and improvement / annotation improvement;
- EVEN MORE DATA;
- EVEN MOAR DATA (give us your ideas where to find it!);
- ~~1000+ additional hours of YouTube~~;
- ~~Some validation / test sets~~;
- ~~Plain benchmarks, "bad files"~~;
Expand Down Expand Up @@ -129,36 +129,35 @@ Also shared a wav version via torrent.

Save us a couple of bucks, download via torrent:
- An **MP3** [version](http://academictorrents.com/details/4a2656878dc819354ba59cd29b1c01182ca0e162) of the dataset (v3), to be deprecated;
- A **WAV** [version](http://academictorrents.com/details/8823f9ffbcf41a58e504eb5c48a02f1db3189e4f) of the dataset (v4) - **BEING UPDATED NOW**;
- A **WAV** [version](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) of the dataset (v5);

You can download separate files via torrent.
Try several torrent clients if some do not work.


## **Links**

Meta data [file](https://ru-open-stt.ams3.digitaloceanspaces.com/public_meta_data_v04_fx.csv).

| Dataset | GB, wav | GB, mp3 | Wav | Mp3 | Source | Manifest |
|---------------------------------------|------|----------------|-------|-----| -------| ----------|
| audiobook_2 | 162 | 21.0 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/private_buriy_audiobooks_2_mp3.tar.gz) | Sources from the Internet + alignment | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/private_buriy_audiobooks_2.csv) |
| radio_2 | 154 | 25.7 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/radio_2_mp3.tar.gz) | Radio | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/radio_2.csv) |
| public_youtube1120 | 237 | 32.4 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120_mp3.tar.gz) | YouTube videos | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120.csv) |
| asr_public_phone_calls_2 | 66 | 7.5 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_2_mp3.tar.gz) | Sources from the Internet + ASR | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_2.csv) |
| public_youtube1120_hq | 31 | 8.6 | torrent | [parе1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120_hq_mp3.tar.gz) | YouTube videos | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120_hq.csv) |
| asr_public_stories_2 | 9 | 1.1 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_2_mp3.tar.gz) | Sources from the Internet + alignment | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_2.csv) |
| tts_russian_addresses_rhvoice_4voices | 80.9 | 9.9 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/tts_russian_addresses_rhvoice_4voices_mp3.tar.gz) | TTS | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/tts_russian_addresses_rhvoice_4voices.csv) |
| public_youtube700 | 75.0 | 9.6 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700_mp3.tar.gz) | YouTube videos | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700.csv) |
| asr_public_phone_calls_1 | 22.7 | 2.6 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_1_mp3.tar.gz) | Sources from the Internet + ASR | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_1.csv) |
| asr_public_stories_1 | 4.1 | 0.5 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_1_mp3.tar.gz) | Public stories | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_1.csv) |
| public_series_1 | 1.9 | 0.2 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_series_1_mp3.tar.gz) | Public series | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_series_1.csv) |
| ru_RU | 1.9 | 0.2 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/ru_ru_mp3.tar.gz) | Caito.de [dataset](https://www.caito.de/data/Training/stt_tts/) | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/ru_RU.csv) |
| voxforge_ru | 1.9 | 0.2 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/voxforge_ru_mp3.tar.gz) | Voxforge [dataset](https://www.repository.voxforge1.org/downloads/) | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/voxforge_ru.csv) |
| russian_single | 0.9 | 0.1 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/russian_single_mp3.tar.gz) | Russian single speaker [dataset](https://www.kaggle.com/bryanpark/russian-single-speaker-speech-dataset) | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/russian_single.csv) |
| asr_calls_2_val | 2 | 0.2 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_calls_2_val_mp3.tar.gz) | Sources from the Internet | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_calls_2_val.csv) |
| public_lecture_1 | 0.7 | 0.1 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_lecture_1_mp3.tar.gz) | Sources from the Internet + manual | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_lecture_1.csv) |
| buriy_audiobooks_2_val | 1 | 0.15 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/buriy_audiobooks_2_val_mp3.tar.gz) | Books + manual | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/buriy_audiobooks_2_val.csv) |
| public_youtube700_val | 2 | 0.13 | torrent | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700_val_mp3.tar.gz) | YouTube videos + manual | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700_val.csv) |
| audiobook_2 | 162 | 21.0 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/private_buriy_audiobooks_2_mp3.tar.gz) | Sources from the Internet + alignment | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/private_buriy_audiobooks_2.csv) |
| radio_2 | 154 | 25.7 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/radio_2_mp3.tar.gz) | Radio | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/radio_2.csv) |
| public_youtube1120 | 237 | 32.4 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120_mp3.tar.gz) | YouTube videos | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120.csv) |
| asr_public_phone_calls_2 | 66 | 7.5 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_2_mp3.tar.gz) | Sources from the Internet + ASR | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_2.csv) |
| public_youtube1120_hq | 31 | 8.6 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [parе1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120_hq_mp3.tar.gz) | YouTube videos | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube1120_hq.csv) |
| asr_public_stories_2 | 9 | 1.1 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_2_mp3.tar.gz) | Sources from the Internet + alignment | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_2.csv) |
| tts_russian_addresses_rhvoice_4voices | 80.9 | 9.9 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/tts_russian_addresses_rhvoice_4voices_mp3.tar.gz) | TTS | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/tts_russian_addresses_rhvoice_4voices.csv) |
| public_youtube700 | 75.0 | 9.6 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700_mp3.tar.gz) | YouTube videos | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700.csv) |
| asr_public_phone_calls_1 | 22.7 | 2.6 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_1_mp3.tar.gz) | Sources from the Internet + ASR | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_phone_calls_1.csv) |
| asr_public_stories_1 | 4.1 | 0.5 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_1_mp3.tar.gz) | Public stories | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_public_stories_1.csv) |
| public_series_1 | 1.9 | 0.2 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_series_1_mp3.tar.gz) | Public series | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_series_1.csv) |
| ru_RU | 1.9 | 0.2 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/ru_ru_mp3.tar.gz) | Caito.de [dataset](https://www.caito.de/data/Training/stt_tts/) | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/ru_RU.csv) |
| voxforge_ru | 1.9 | 0.2 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/voxforge_ru_mp3.tar.gz) | Voxforge [dataset](https://www.repository.voxforge1.org/downloads/) | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/voxforge_ru.csv) |
| russian_single | 0.9 | 0.1 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/russian_single_mp3.tar.gz) | Russian single speaker [dataset](https://www.kaggle.com/bryanpark/russian-single-speaker-speech-dataset) | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/russian_single.csv) |
| asr_calls_2_val | 2 | 0.2 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_calls_2_val_mp3.tar.gz) | Sources from the Internet | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/asr_calls_2_val.csv) |
| public_lecture_1 | 0.7 | 0.1 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_lecture_1_mp3.tar.gz) | Sources from the Internet + manual | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_lecture_1.csv) |
| buriy_audiobooks_2_val | 1 | 0.15 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/buriy_audiobooks_2_val_mp3.tar.gz) | Books + manual | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/buriy_audiobooks_2_val.csv) |
| public_youtube700_val | 2 | 0.13 | [torrent](http://academictorrents.com/details/a12a08b39cf3626407e10e01126cf27c198446c2) | [part1](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700_val_mp3.tar.gz) | YouTube videos + manual | [link](https://ru-open-stt.ams3.digitaloceanspaces.com/public_youtube700_val.csv) |
| Total | 855 | 87.5 | | | | |


Expand Down

0 comments on commit 9ca61e4

Please sign in to comment.