Skip to content

d8ahazard/AudioLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AudioLab

AudioLab Logo

MIT License Python CUDA Contributions Welcome

Huge thanks to RunDiffusion for supporting this project! πŸŽ‰

AudioLab is an open-source powerhouse for voice-cloning and audio separation, built with modularity and extensibility in mind. Whether you're an audio engineer, researcher, or just a curious tinkerer, AudioLab has you covered.


🌟 Features

🎡 Audio Processing Capabilities

  • 🎼 Music Generation: Create music from scratch or remix existing tracks using YuE.
  • 🎡 Song Generation: Create full-length songs with vocals and instrumentals using DiffRhythm.
  • πŸ—£οΈ Zonos Text-to-Speech: High-quality TTS with deep learning.
  • 🎭 Orpheus TTS: Real-time natural-sounding speech powered by large language models.
  • πŸ“’ Text-to-Speech: Clone voices and generate natural-sounding speech with Coqui TTS.
  • πŸ”Š Text-to-Audio: Generate sound effects and ambient audio from text descriptions using Stable Audio.
  • πŸŽ›οΈ Audio Separation: Isolate vocals, drums, bass, and other components from a track.
  • 🎀 Vocal Isolation: Distinguish lead vocals from background.
  • πŸ”‡ Noise Removal: Get rid of echo, crowd noise, and unwanted sounds.
  • 🧬 Voice Cloning: Train high-quality voice models with just 30-60 minutes of data.
  • πŸš€ Audio Super Resolution: Enhance and clean up audio.
  • 🎚️ Remastering: Apply spectral characteristics from a reference track.
  • πŸ”„ Audio Conversion: Convert between popular formats effortlessly.
  • πŸ“œ Export to DAW: Easily create Ableton Live and Reaper projects from separated stems.

πŸ€– Automation Features

  • Auto-preprocessing for voice model training.
  • Merge separated sources back into a single file with ease.

πŸ› οΈ Pre-requisites

Before you dive in, make sure you have:

  1. Python 3.10 – Because match statements exist, and fairseq is allergic to 3.11.
  2. CUDA 12.4 – Other versions? Maybe fine. Maybe not. Do you like surprises?
  3. Virtual Environment – Strongly recommended to avoid dependency chaos.
  4. Windows Users – You're in for an adventure! Zonos/Triton can be a pain. Make sure to install MSVC and add these paths to your environment variables:
    C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64
    C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx86\x86
    

Note: This project assumes basic Python knowledge. If you've never set up a virtual environment before... now's the time to learn! πŸš€


πŸš‘ Windows Troubleshooting

If dependencies refuse to install on Windows, try the following:

  • Install MSVC Build Tools:
  • Ensure CUDA is correctly installed:
  • DLL Errors? Try moving necessary DLLs from /libs to:
    .venv\lib\site-packages\pandas\_libs\window
    .venv\lib\site-packages\sklearn\.libs
    C:\Program Files\Python310\ (or wherever your Python is installed)
    

πŸš€ Installation

Heads up! The requirements.txt is not complete on purpose. Use the setup scripts instead!

πŸ›  Steps

  1. Clone the repository:
    git clone https://github.com/yourusername/audiolab.git
    cd audiolab
  2. Set up a virtual environment:
    python -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
  3. Run the setup script:
    ./setup.sh  # Windows: setup.bat

Common Issues & Fixes:

  • Downgrade pip if installation fails:
    python -m pip install pip==24.0
  • Install older CUDA drivers if needed: CUDA Toolkit Archive
  • Install fairseq manually if necessary:
    pip install fairseq>=0.12.2 --no-deps

πŸŽ›οΈ Running AudioLab

  1. Activate your virtual environment:
    source venv/bin/activate  # Windows: venv\Scripts\activate.bat
  2. Run the application:
    python main.py
  3. Optional flags:
    • --listen β†’ Bind to 0.0.0.0 for remote access.
    • --port PORT β†’ Specify a custom port.

πŸ“Έ Screenshots

Screenshot 1 Screenshot 2
Screenshot 3 Screenshot 4
Screenshot 5

πŸ’» Key Features

Sound Forge: Text-to-Audio Generation

Generate high-quality sound effects, ambient audio, and musical samples from text descriptions:

  • πŸ”Š Text Prompting: Create sounds by describing them in natural language.
  • ⏱️ Variable Duration: Generate audio up to 47 seconds long.
  • πŸŽ›οΈ Full Control: Adjust parameters like inference steps and guidance scale.
  • 🎭 Negative Prompts: Specify what to avoid in your generated audio.
  • 🎲 Multiple Variations: Generate different versions of the same prompt.

Example prompts:

  • "A peaceful forest ambience with birds chirping and leaves rustling"
  • "An electronic beat with pulsing bass at 120 BPM"
  • "A sci-fi spaceship engine humming"

DiffRhythm: Full-Length Song Generation

Create complete songs with vocals and instrumentals using state-of-the-art latent diffusion:

  • 🎡 Complete Songs: Generate full-length songs up to 4m45s.
  • 🎀 Lyrics Support: Add lyrics using LRC format with timestamps.
  • 🎹 Style Control: Define the musical style using text prompts or reference audio.
  • ⚑ Blazingly Fast: Efficient generation compared to other music models.
  • πŸ’Ύ Memory Efficient: Chunked decoding option for consumer GPUs.

Example use cases:

  • Create original songs in any genre with your own lyrics
  • Generate background music for videos with specific moods
  • Experiment with unique musical styles and vocal characteristics

Orpheus TTS: Real-time Speech Synthesis

Generate natural-sounding speech with LLM-powered text-to-speech capabilities:

  • ⚑ Real-time Processing: Instantaneous speech generation.
  • πŸ—£οΈ Voice Cloning: Create custom voice models from your recordings.
  • πŸ˜€ Emotion Control: Adjust speaking style for more expressive speech.
  • 🌐 Multilingual Support: Generate speech in multiple languages.
  • 🎭 Style Variety: Create different styles from a single voice model.

Example applications:

  • Create audiobooks with natural narration
  • Develop voice assistants with your own voice
  • Generate voiceovers for videos and presentations
  • Create accessible content for those with reading difficulties

Transcribe: Advanced Speech-to-Text

Convert audio recordings to text with speaker identification and precise timing:

  • πŸ‘₯ Speaker Diarization: Automatically identify and label different speakers.
  • ⏱️ Word-Level Timestamps: Create perfectly aligned text with audio timing.
  • 🌍 Multilingual Support: Transcribe content in multiple languages.
  • πŸ“Š Batch Processing: Process multiple audio files in sequence.
  • πŸ“‹ Multiple Output Formats: Generate both JSON metadata and readable text.

Example applications:

  • Create subtitles for videos with speaker labels
  • Transcribe interviews and meetings with speaker attribution
  • Generate searchable archives of audio content
  • Create training data for voice and speech models

Process Tab: Audio Processing Pipeline

The heart of AudioLab with modular audio processing through a chain of wrappers:

  • πŸ”Š Separate: Split audio into vocals, drums, bass, and other instruments.
  • 🎀 Clone: Apply voice conversion with trained models.
  • ⚑ Remaster: Enhance audio based on reference tracks.
  • πŸ”¬ Super Resolution: Improve audio detail and clarity.
  • πŸ”€ Merge: Mix separate audio tracks with complete control.
  • πŸ”„ Convert: Change audio formats with customizable settings.

Example workflows:

  • Extract vocals β†’ Apply voice clone β†’ Merge with original instruments
  • Split song β†’ Enhance each component β†’ Remix with new levels
  • Remaster old recordings using modern reference tracks

RVC Training: Voice Model Creation

Train custom voice models for voice conversion and cloning:

  • 🎯 One-Click Process: Simplified training with automatic preprocessing.
  • βš™οΈ Advanced Options: Fine-tune training for specific voice characteristics.
  • πŸ“Š Training Visualization: Monitor progress in real-time.
  • πŸ”„ Model Management: Organize and share your trained voice models.

Example applications:

  • Create virtual versions of your own voice
  • Develop character voices for games or animations
  • Restore or enhance historical recordings

🀝 Acknowledgements

AudioLab is powered by some fantastic open-source projects:


🌟 Contribute

Want to help? Check out the Contributing Guide!


πŸ“œ License

Licensed under MIT. See LICENSE for details.


Made with ❀️ by the AudioLab team. (AKA D8ahazard)

About

Audio...lab

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages