AudioLab

Huge thanks to RunDiffusion for supporting this project! 🎉

AudioLab is an open-source powerhouse for voice-cloning and audio separation, built with modularity and extensibility in mind. Whether you're an audio engineer, researcher, or just a curious tinkerer, AudioLab has you covered.

🌟 Features

🎵 Audio Processing Capabilities

🎼 Music Generation: Create music from scratch or remix existing tracks using YuE.
🎵 Song Generation: Create full-length songs with vocals and instrumentals using DiffRhythm.
🗣️ Zonos Text-to-Speech: High-quality TTS with deep learning.
🎭 Orpheus TTS: Real-time natural-sounding speech powered by large language models.
📢 Text-to-Speech: Clone voices and generate natural-sounding speech with Coqui TTS.
🔊 Text-to-Audio: Generate sound effects and ambient audio from text descriptions using Stable Audio.
🎛️ Audio Separation: Isolate vocals, drums, bass, and other components from a track.
🎤 Vocal Isolation: Distinguish lead vocals from background.
🔇 Noise Removal: Get rid of echo, crowd noise, and unwanted sounds.
🧬 Voice Cloning: Train high-quality voice models with just 30-60 minutes of data.
🚀 Audio Super Resolution: Enhance and clean up audio.
🎚️ Remastering: Apply spectral characteristics from a reference track.
🔄 Audio Conversion: Convert between popular formats effortlessly.
📜 Export to DAW: Easily create Ableton Live and Reaper projects from separated stems.

🤖 Automation Features

Auto-preprocessing for voice model training.
Merge separated sources back into a single file with ease.

🛠️ Pre-requisites

Before you dive in, make sure you have:

Python 3.10 – Because match statements exist, and fairseq is allergic to 3.11.
CUDA 12.4 – Other versions? Maybe fine. Maybe not. Do you like surprises?
Virtual Environment – Strongly recommended to avoid dependency chaos.

Windows Users – You're in for an adventure! Zonos/Triton can be a pain. Make sure to install MSVC and add these paths to your environment variables:

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx86\x86

Note: This project assumes basic Python knowledge. If you've never set up a virtual environment before... now's the time to learn! 🚀

🚑 Windows Troubleshooting

If dependencies refuse to install on Windows, try the following:

Install MSVC Build Tools:
- VC Redist x64
- Build Tools
Ensure CUDA is correctly installed:
- Check version: nvcc --version
- Download CUDA 12.4

DLL Errors? Try moving necessary DLLs from /libs to:

.venv\lib\site-packages\pandas\_libs\window
.venv\lib\site-packages\sklearn\.libs
C:\Program Files\Python310\ (or wherever your Python is installed)

🚀 Installation

Heads up! The requirements.txt is not complete on purpose. Use the setup scripts instead!

🛠 Steps

Clone the repository:

git clone https://github.com/yourusername/audiolab.git
cd audiolab

Set up a virtual environment:

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

Run the setup script:
```
./setup.sh  # Windows: setup.bat
```

Common Issues & Fixes:

Downgrade pip if installation fails:
```
python -m pip install pip==24.0
```
Install older CUDA drivers if needed: CUDA Toolkit Archive
Install fairseq manually if necessary:
```
pip install fairseq>=0.12.2 --no-deps
```

🎛️ Running AudioLab

Activate your virtual environment:

source venv/bin/activate  # Windows: venv\Scripts\activate.bat

Run the application:
```
python main.py
```
Optional flags:
- --listen → Bind to 0.0.0.0 for remote access.
- --port PORT → Specify a custom port.

📸 Screenshots

💻 Key Features

Sound Forge: Text-to-Audio Generation

Generate high-quality sound effects, ambient audio, and musical samples from text descriptions:

🔊 Text Prompting: Create sounds by describing them in natural language.
⏱️ Variable Duration: Generate audio up to 47 seconds long.
🎛️ Full Control: Adjust parameters like inference steps and guidance scale.
🎭 Negative Prompts: Specify what to avoid in your generated audio.
🎲 Multiple Variations: Generate different versions of the same prompt.

Example prompts:

"A peaceful forest ambience with birds chirping and leaves rustling"
"An electronic beat with pulsing bass at 120 BPM"
"A sci-fi spaceship engine humming"

DiffRhythm: Full-Length Song Generation

Create complete songs with vocals and instrumentals using state-of-the-art latent diffusion:

🎵 Complete Songs: Generate full-length songs up to 4m45s.
🎤 Lyrics Support: Add lyrics using LRC format with timestamps.
🎹 Style Control: Define the musical style using text prompts or reference audio.
⚡ Blazingly Fast: Efficient generation compared to other music models.
💾 Memory Efficient: Chunked decoding option for consumer GPUs.

Example use cases:

Create original songs in any genre with your own lyrics
Generate background music for videos with specific moods
Experiment with unique musical styles and vocal characteristics

Orpheus TTS: Real-time Speech Synthesis

Generate natural-sounding speech with LLM-powered text-to-speech capabilities:

⚡ Real-time Processing: Instantaneous speech generation.
🗣️ Voice Cloning: Create custom voice models from your recordings.
😀 Emotion Control: Adjust speaking style for more expressive speech.
🌐 Multilingual Support: Generate speech in multiple languages.
🎭 Style Variety: Create different styles from a single voice model.

Example applications:

Create audiobooks with natural narration
Develop voice assistants with your own voice
Generate voiceovers for videos and presentations
Create accessible content for those with reading difficulties

Transcribe: Advanced Speech-to-Text

Convert audio recordings to text with speaker identification and precise timing:

👥 Speaker Diarization: Automatically identify and label different speakers.
⏱️ Word-Level Timestamps: Create perfectly aligned text with audio timing.
🌍 Multilingual Support: Transcribe content in multiple languages.
📊 Batch Processing: Process multiple audio files in sequence.
📋 Multiple Output Formats: Generate both JSON metadata and readable text.

Example applications:

Create subtitles for videos with speaker labels
Transcribe interviews and meetings with speaker attribution
Generate searchable archives of audio content
Create training data for voice and speech models

Process Tab: Audio Processing Pipeline

The heart of AudioLab with modular audio processing through a chain of wrappers:

🔊 Separate: Split audio into vocals, drums, bass, and other instruments.
🎤 Clone: Apply voice conversion with trained models.
⚡ Remaster: Enhance audio based on reference tracks.
🔬 Super Resolution: Improve audio detail and clarity.
🔀 Merge: Mix separate audio tracks with complete control.
🔄 Convert: Change audio formats with customizable settings.

Example workflows:

Extract vocals → Apply voice clone → Merge with original instruments
Split song → Enhance each component → Remix with new levels
Remaster old recordings using modern reference tracks

RVC Training: Voice Model Creation

Train custom voice models for voice conversion and cloning:

🎯 One-Click Process: Simplified training with automatic preprocessing.
⚙️ Advanced Options: Fine-tune training for specific voice characteristics.
📊 Training Visualization: Monitor progress in real-time.
🔄 Model Management: Organize and share your trained voice models.

Example applications:

Create virtual versions of your own voice
Develop character voices for games or animations
Restore or enhance historical recordings

🤝 Acknowledgements

AudioLab is powered by some fantastic open-source projects:

🎵 python-audio-separator – Core for audio separation.
🎚 matchering – Professional-grade remastering.
🔊 versatile-audio-super-resolution – High-quality audio enhancement.
🎙 Real-Time-Voice-Cloning – Voice cloning.
🎶 MVSEP-MDX23 – Music separation.
📜 WhisperX – Audio transcription.
🗣 Coqui TTS – State-of-the-art TTS.
🎼 YuE – Music generation.
🏆 Zonos – High-quality TTS.
🔈 Stable Audio – Text-to-audio generation.
🎵 DiffRhythm – Full-length song generation with latent diffusion.
🗣️ Orpheus-TTS – Real-time high-quality text-to-speech.

🌟 Contribute

Want to help? Check out the Contributing Guide!

📜 License

Licensed under MIT. See LICENSE for details.

Made with ❤️ by the AudioLab team. (AKA D8ahazard)

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.idea		.idea
css		css
handlers		handlers
js		js
layouts		layouts
libs		libs
modules		modules
res		res
test		test
util		util
wheels		wheels
wrappers		wrappers
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_NOTES.md		PROJECT_NOTES.md
README.md		README.md
api.py		api.py
main.py		main.py
requirements.txt		requirements.txt
setup.bat		setup.bat
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioLab

🌟 Features

🎵 Audio Processing Capabilities

🤖 Automation Features

🛠️ Pre-requisites

🚑 Windows Troubleshooting

🚀 Installation

🛠 Steps

🎛️ Running AudioLab

📸 Screenshots

💻 Key Features

Sound Forge: Text-to-Audio Generation

DiffRhythm: Full-Length Song Generation

Orpheus TTS: Real-time Speech Synthesis

Transcribe: Advanced Speech-to-Text

Process Tab: Audio Processing Pipeline

RVC Training: Voice Model Creation

🤝 Acknowledgements

🌟 Contribute

📜 License

About

Releases 1

Packages

Languages

License

d8ahazard/AudioLab

Folders and files

Latest commit

History

Repository files navigation

AudioLab

🌟 Features

🎵 Audio Processing Capabilities

🤖 Automation Features

🛠️ Pre-requisites

🚑 Windows Troubleshooting

🚀 Installation

🛠 Steps

🎛️ Running AudioLab

📸 Screenshots

💻 Key Features

Sound Forge: Text-to-Audio Generation

DiffRhythm: Full-Length Song Generation

Orpheus TTS: Real-time Speech Synthesis

Transcribe: Advanced Speech-to-Text

Process Tab: Audio Processing Pipeline

RVC Training: Voice Model Creation

🤝 Acknowledgements

🌟 Contribute

📜 License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages