Voicera: A Text-to-Speech Model

voicera_demo.mp4

Voicera: A Text-to-Speech Model

"Voicera" is a text-to-speech (TTS) model designed for generating speech from written text. It uses a GPT-2 type architecture, which helps in creating natural and expressive speech. The model converts audio into tokens using the "Multi-Scale Neural Audio Codec (SNAC)" model, allowing it to understand and produce speech sounds. Voicera aims to provide clear and understandable speech, focusing on natural pronunciation and intonation. It's a project to explore TTS technology and improve audio output quality.

How It Works

Data Preparation: We use a dataset containing text paired with corresponding audio. The audio is tokenized using the Multi-Scale Neural Audio Codec (SNAC) model, which converts audio into a sequence of tokens that the model can process.
Model Architecture: Voicera uses a transformer-based architecture similar to GPT-2, which is adept at handling sequential data. This architecture allows the model to understand the nuances of language and generate coherent speech.
Training: The model is trained on a large dataset of paired text and audio tokens. After each epoch, the model's performance is evaluated to ensure the generated audio improves over time.

Demo

The video above shows the model capabilities

Getting Started

There are three models, We have the base model and two other finetuned on jenny and expresso datasets The best of all currently is the Jenny finetune Here are colab link to all 3 respectively

This whole model was trained on a notebook on kaggle, here's the link to the kaggle notebook

Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
examples		examples
README.md		README.md
tts-snac-base.ipynb		tts-snac-base.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voicera: A Text-to-Speech Model

How It Works

Demo

Getting Started

This whole model was trained on a notebook on kaggle, here's the link to the kaggle notebook

About

Releases

Packages

Languages

Lwasinam/voicera

Folders and files

Latest commit

History

Repository files navigation

Voicera: A Text-to-Speech Model

How It Works

Demo

Getting Started

This whole model was trained on a notebook on kaggle, here's the link to the kaggle notebook

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages