GenVox

GenVox is an end-to-end Python-based neural speech synthesis toolkit that provides both Text-to-Speech (TTS) and Vocoder models. GenVox uses Pytorch for its neural backend. GenVox coding style is heavily inspired by Coqui-AI, and the data flow is inspired by ESPNET. The model architecture codes were taken from various places but heavily modified for better readability, quality and optimization. Read devlog for more details regarding the development.

🥣 Recipes for training your model from scratch with WandB support for logging.
⚒ Tools for creating and processing datasets, and data analysis.
⏱ Codes for optimized audio and text processing, logging, and pipelines for server-specific optimization.
🔥 Designed to add your own custom architectures easily.

Supported TTS (Text2Mel):

Tacotron2 - paper - original repo

📖 Installation

Clone the repo using any one of the methods

# recommended method
git clone https://github.com/saiakarsh193/GenVox
git checkout dev

git clone -b dev https://github.com/saiakarsh193/GenVox

git clone -b dev --single-branch https://github.com/saiakarsh193/GenVox

git remote show origin # to check everything is correctly tracked

NOTE: If running on ADA cluster

sinteractive -c 8 -g 1 # to prevent out of memory issue
module load u18/cuda/10.2
module load u18/ffmpeg/5.0.1

Setting up the environment

# create environment with python 3.8 (as it is tested and working) and activate it
conda create --name genvox python=3.8
conda activate genvox
# or
conda create --prefix ./genvox python=3.8
conda activate ./genvox

pip install -r requirements.txt # to install the dependencies

🚀 Training your model

# to use wandb for logging, you need to login (only once)
wandb login # then type your API key (you can find your API key in your browser at https://wandb.ai/authorize)

# after all the hard work, you can finally run the code
python3 run.py

📢 Generate speech using a pretrained model

Check demo.py for more details

import scipy.io
from core.synthesizer import Synthesizer
from models.tts.tacotron2 import Tacotron2

syn = Synthesizer(
    tts_model_class=Tacotron2,
    tts_config_path=<path/to/config>,
    tts_checkpoint_path=<path/to/checkpoint>
)
outputs = syn.tts(text="Hello world! This is a test sentence.")
scipy.io.wavfile.write('pred_sig.wav', outputs["sampling_rate"], outputs["waveform"])

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
configs		configs
core		core
models		models
pipelines		pipelines
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
dev_log.md		dev_log.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenVox

📖 Installation

🚀 Training your model

📢 Generate speech using a pretrained model

About

Releases 3

Packages

Contributors 2

Languages

License

saiakarsh193/GenVox

Folders and files

Latest commit

History

Repository files navigation

GenVox

📖 Installation

🚀 Training your model

📢 Generate speech using a pretrained model

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages