based on NeMo
based on espnet
based on deepspeech.pytorch
PyTorch implementation of DeepSpeech2 trained with the CTC objective.
differences to deepspeech.pytorch
- no use of warp-ctc, instead torch.nn.CTCLoss
- powered by pytorch-lightning
- after 8 epochs and 24hours with Adam
python evaluation.py --model epoch=8.ckpt --datasets test-clean
2528 of 2620 samples are suitable for training
100%|█████████████████████████████████████| 127/127 [02:12<00:00, 1.04s/it]
Test Summary Average WER 9.925 Average CER 3.239
python evaluation.py --model epoch=8.ckpt --datasets test-other
2893 of 2939 samples are suitable for training
100%|███████████████████████████████████████| 145/145 [01:19<00:00, 1.83it/s]
Test Summary Average WER 27.879 Average CER 11.739
- to download data see: https://github.com/dertilo/speech-to-text/corpora/download_corpora.py
- splits
datasets = [ ("train", ["train-clean-100", "train-clean-360", "train-other-500"]), ("eval", ["dev-clean", "dev-other"]), ("test", ["test-clean", "test-other"]), ]
- number of samples
train got 281241 samples eval got 5567 samples test got 5559 samples