Skip to content

Commit

Permalink
Merge branch 'remsky:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
eschmidbauer authored Jan 16, 2025
2 parents 20952d4 + 7711c32 commit 5cd66cd
Show file tree
Hide file tree
Showing 17 changed files with 52 additions and 45 deletions.
15 changes: 1 addition & 14 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,23 +34,10 @@ ENV/

# Project specific
# Model files
*.pt

*.pth
*.tar*

# Voice files
api/src/voices/af_bella.pt
api/src/voices/af_nicole.pt
api/src/voices/af_sarah.pt
api/src/voices/af_sky.pt
api/src/voices/af.pt
api/src/voices/am_adam.pt
api/src/voices/am_michael.pt
api/src/voices/bf_emma.pt
api/src/voices/bf_isabella.pt
api/src/voices/bm_george.pt
api/src/voices/bm_lewis.pt

# Audio files
examples/*.wav
examples/*.pcm
Expand Down
55 changes: 33 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,50 +5,50 @@
# <sub><sub>_`FastKoko`_ </sub></sub>
[![Tests](https://img.shields.io/badge/tests-117%20passed-darkgreen)]()
[![Coverage](https://img.shields.io/badge/coverage-60%25-grey)]()
[![Tested at Model Commit](https://img.shields.io/badge/last--tested--model--commit-a67f113-blue)](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667) [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero) [![Buy Me A Coffee](https://img.shields.io/badge/BMC-✨☕-gray?style=flat-square)](https://www.buymeacoffee.com/remsky)
[![Tested at Model Commit](https://img.shields.io/badge/last--tested--model--commit-a67f113-blue)](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667) [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero)

Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
- OpenAI-compatible Speech endpoint, with inline voice combination functionality
- NVIDIA GPU accelerated or CPU Onnx inference
- very fast generation time
- 100x+ real time speed via HF A100
- 35-50x+ real time speed via 4060Ti
- 35x-100x+ real time speed via 4060Ti+
- 5x+ real time speed via M3 Pro CPU
- streaming support w/ variable chunking to control latency & artifacts
- simple audio generation web ui utility
- (new) phoneme endpoints for conversion and generation

- phoneme, simple audio generation web ui utility
- Runs on an 80mb-300mb model (CUDA container + 5gb on disk due to drivers)

## Quick Start

The service can be accessed through either the API endpoints or the Gradio web interface.

1. Install prerequisites:
1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):
- Install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
- Clone the repository:
```bash
git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI
```

# * Switch to stable branch if any issues *
git checkout v0.0.5post1-stable

2. Start the service:

- Using Docker Compose (Full setup including UI):
```bash
cd docker/gpu # OR
# cd docker/cpu # Run this or the above
docker compose up --build
```
- OR running the API alone using Docker (model + voice packs baked in):
```bash
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest # Nvidia GPU
# Minified versions are available with `:latest-slim` tag, though it is a first test and may not be functional
```

Once started:
- The API will be available at http://localhost:8880
- The UI can be accessed at http://localhost:7860

__Or__ running the API alone using Docker (model + voice packs baked in) (Most Recent):

```bash
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0post1 # CPU
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0post1 # Nvidia GPU
```


2. Run locally as an OpenAI-Compatible Speech Endpoint
4. Run locally as an OpenAI-Compatible Speech Endpoint
```python
from openai import OpenAI
client = OpenAI(
Expand Down Expand Up @@ -183,8 +183,19 @@ If you only want the API, just comment out everything in the docker-compose.yml
Currently, voices created via the API are accessible here, but voice combination/creation has not yet been added
*Note: Recent updates for streaming could lead to temporary glitches. If so, pull from the most recent stable release v0.0.2 to restore*
Running the UI Docker Service
- If you only want to run the Gradio web interface separately and connect it to an existing API service:
```bash
docker run -p 7860:7860 \
-e API_HOST=<api-hostname-or-ip> \
-e API_PORT=8880 \
ghcr.io/remsky/kokoro-fastapi-ui:v0.1.0
```
- Replace `<api-hostname-or-ip>` with:
- `kokoro-tts` if the UI container is running in the same Docker Compose setup.
- `localhost` if the API is running on your local machine.
### Disabling Local Saving
You can disable local saving of audio files and hide the file view in the UI by setting the `DISABLE_LOCAL_SAVING` environment variable to `true`. This is useful when running the service on a server where you don't want to store generated audio files locally.
Expand Down
Binary file added api/src/voices/af.pt
Binary file not shown.
Binary file added api/src/voices/af_bella.pt
Binary file not shown.
Binary file added api/src/voices/af_nicole.pt
Binary file not shown.
Binary file added api/src/voices/af_sarah.pt
Binary file not shown.
Binary file added api/src/voices/af_sky.pt
Binary file not shown.
Binary file added api/src/voices/am_adam.pt
Binary file not shown.
Binary file added api/src/voices/am_michael.pt
Binary file not shown.
Binary file added api/src/voices/bf_emma.pt
Binary file not shown.
Binary file added api/src/voices/bf_isabella.pt
Binary file not shown.
Binary file added api/src/voices/bm_george.pt
Binary file not shown.
Binary file added api/src/voices/bm_lewis.pt
Binary file not shown.
10 changes: 6 additions & 4 deletions docker/cpu/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
name: kokoro-tts
services:
kokoro-tts:
image: ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0
# build:
# context: ../..
# dockerfile: docker/cpu/Dockerfile
# image: ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0
build:
context: ../..
dockerfile: docker/cpu/Dockerfile
volumes:
- ../../api/src:/app/api/src
- ../../api/src/voices:/app/api/src/voices
Expand Down Expand Up @@ -35,3 +35,5 @@ services:
- GRADIO_WATCH=True # Enable hot reloading
- PYTHONUNBUFFERED=1 # Ensure Python output is not buffered
- DISABLE_LOCAL_SAVING=false # Set to 'true' to disable local saving and hide file view
- API_HOST=kokoro-tts # Set TTS service URL
- API_PORT=8880 # Set TTS service PORT
10 changes: 6 additions & 4 deletions docker/gpu/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
name: kokoro-tts
services:
kokoro-tts:
image: ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0
# build:
# context: ../..
# dockerfile: docker/gpu/Dockerfile
# image: ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0
build:
context: ../..
dockerfile: docker/gpu/Dockerfile
volumes:
- ../../api/src:/app/api/src # Mount src for development
- ../../api/src/voices:/app/api/src/voices # Mount voices for persistence
Expand Down Expand Up @@ -35,3 +35,5 @@ services:
- GRADIO_WATCH=1 # Enable hot reloading
- PYTHONUNBUFFERED=1 # Ensure Python output is not buffered
- DISABLE_LOCAL_SAVING=false # Set to 'true' to disable local saving and hide file view
- API_HOST=kokoro-tts # Set TTS service URL
- API_PORT=8880 # Set TTS service PORT
3 changes: 3 additions & 0 deletions ui/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,8 @@ RUN mkdir -p data/inputs data/outputs
# Copy the application files
COPY . .

ENV API_HOST=kokoro-tts
ENV API_PORT=8880

# Run the Gradio app
CMD ["python", "app.py"]
4 changes: 3 additions & 1 deletion ui/lib/config.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import os

# API Configuration
API_URL = "http://kokoro-tts:8880"
API_HOST = os.getenv("API_HOST", "kokoro-tts")
API_PORT = os.getenv("API_PORT", "8880")
API_URL = f"http://{API_HOST}:{API_PORT}"

# File paths
INPUTS_DIR = "/app/ui/data/inputs"
Expand Down

0 comments on commit 5cd66cd

Please sign in to comment.