This repository provides speech enhancement via regeneration implementation with Pytorch. Algorithm is based on paper, but several changes were made in feature extraction and therefore model parameters.
TODO list:
- add inference scripts
- implement streaming model and its inference
- provide multilingual enhancement models (and adapt feature extraction too)
- make pypi package
- release pretrained models
This repository is tested on Ubuntu 16.04 with a GPU 1080 Ti.
- Python 3.7+ (follow installation page)
- Cuda 10.0+ (guide for ubuntu)
- libsndfile (you can install via
sudo apt install libsndfile-dev
in ubuntu)
- pip requirements (defined in
requirements.txt
, install viapip install -r requirements.txt
):- hydra-core 1.0.6+
- pytorch 1.7+
- torchaudio 0.7.2+
- librosa 0.8.0+
- pytest 6.2.0+
- transformers 4.3.0+, pyworld 0.2.12+, pyannote.audio 2.0+ (for feature extraction)
- (optional) ffmpeg (for .mp3 support, you can install via
sudo apt install ffmpeg
in ubuntu)
git clone https://github.com/SolomidHero/speech-regeneration-enhancer
pip install -e ./speech-regeneration-enhancer
For training you should use DAPS dataset, or dataset with similar file namings (folder structure doesn't matter):
data_folder/
wav_1_clean.wav
dirty/
wav_1_recoder_bathroom.wav
wav_2_microphone_street.wav
some_sub_tree/
wav_2_clean.wav
In this repository we use hydra configuration (read more), thus for training and inference you can only change config.yaml
file. Also defining through parameters in bash is available.
When changes to config are made, you can check yourself if your parameters are acceptable by any of these commands:
pytest # to check if everything is working
pytest tests/test_scripts.py # to check if training process can be done
- After data downloading and config changes, run preprocessing script (feature extraction made here):
preprocess.py dataset.wav_dir=/path/to/wavs # parameters can be added into config directly
- Finally we are able to train model:
train.py train.epochs=50 train.ckpt_dir=/path/to/ckpts # parameters can be added into config directly
In /path/to/ckpt
checkpoints for generator and other stuff (discriminator, optimizers) will appear from now.