Merge branch 'remsky:master' into master

remsky · Jan 16, 2025 · 5cd66cd · 5cd66cd
2 parents 20952d4 + 7711c32
commit 5cd66cd
Show file tree

Hide file tree

Showing 17 changed files with 52 additions and 45 deletions.
diff --git a/.gitignore b/.gitignore
@@ -34,23 +34,10 @@ ENV/
 
 # Project specific
 # Model files
-*.pt
+
 *.pth
 *.tar*
 
-# Voice files
-api/src/voices/af_bella.pt
-api/src/voices/af_nicole.pt
-api/src/voices/af_sarah.pt
-api/src/voices/af_sky.pt
-api/src/voices/af.pt
-api/src/voices/am_adam.pt
-api/src/voices/am_michael.pt
-api/src/voices/bf_emma.pt
-api/src/voices/bf_isabella.pt
-api/src/voices/bm_george.pt
-api/src/voices/bm_lewis.pt
-
 # Audio files
 examples/*.wav
 examples/*.pcm

diff --git a/README.md b/README.md
@@ -5,50 +5,50 @@
 # <sub><sub>_`FastKoko`_ </sub></sub>
 [![Tests](https://img.shields.io/badge/tests-117%20passed-darkgreen)]()
 [![Coverage](https://img.shields.io/badge/coverage-60%25-grey)]()
-[![Tested at Model Commit](https://img.shields.io/badge/last--tested--model--commit-a67f113-blue)](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667) [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero) [![Buy Me A Coffee](https://img.shields.io/badge/BMC-✨☕-gray?style=flat-square)](https://www.buymeacoffee.com/remsky)
+[![Tested at Model Commit](https://img.shields.io/badge/last--tested--model--commit-a67f113-blue)](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667) [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero)
 
 Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
 - OpenAI-compatible Speech endpoint, with inline voice combination functionality
 - NVIDIA GPU accelerated or CPU Onnx inference 
 - very fast generation time
-  - 100x+ real time speed via HF A100
-  - 35-50x+ real time speed via 4060Ti
+  - 35x-100x+ real time speed via 4060Ti+
   - 5x+ real time speed via M3 Pro CPU
 - streaming support w/ variable chunking to control latency & artifacts
-- simple audio generation web ui utility
-- (new) phoneme endpoints for conversion and generation
-
+- phoneme, simple audio generation web ui utility
+- Runs on an 80mb-300mb model (CUDA container + 5gb on disk due to drivers)  
 
 ## Quick Start
 
 The service can be accessed through either the API endpoints or the Gradio web interface.
 
-1. Install prerequisites:
+1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):
    - Install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
    - Clone the repository:
         ```bash
         git clone https://github.com/remsky/Kokoro-FastAPI.git
         cd Kokoro-FastAPI
-        ```
+
+        #   * Switch to stable branch if any issues *
+        git checkout v0.0.5post1-stable
 
-2. Start the service:
-
-   - Using Docker Compose (Full setup including UI):
-        ```bash
         cd docker/gpu # OR 
         # cd docker/cpu # Run this or the above
         docker compose up --build 
         ```
-   - OR running the API alone using Docker (model + voice packs baked in):
-        ```bash
-
-        docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU
-        docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest # Nvidia GPU
-        # Minified versions are available with `:latest-slim` tag, though it is a first test and may not be functional
-        ```
+
+      Once started:
+     - The API will be available at http://localhost:8880
+     - The UI can be accessed at http://localhost:7860
+
+  __Or__ running the API alone using Docker (model + voice packs baked in) (Most Recent):
+
+  ```bash
+  docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0post1 # CPU 
+  docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0post1 # Nvidia GPU
+  ```
 
 
-2. Run locally as an OpenAI-Compatible Speech Endpoint
+4. Run locally as an OpenAI-Compatible Speech Endpoint
     ```python
     from openai import OpenAI
     client = OpenAI(
@@ -183,8 +183,19 @@ If you only want the API, just comment out everything in the docker-compose.yml
 
 Currently, voices created via the API are accessible here, but voice combination/creation has not yet been added
 
-*Note: Recent updates for streaming could lead to temporary glitches. If so, pull from the most recent stable release v0.0.2 to restore*
-
+Running the UI Docker Service
+   - If you only want to run the Gradio web interface separately and connect it to an existing API service:
+      ```bash
+      docker run -p 7860:7860 \
+        -e API_HOST=<api-hostname-or-ip> \
+        -e API_PORT=8880 \
+        ghcr.io/remsky/kokoro-fastapi-ui:v0.1.0
+      ```
+
+     - Replace `<api-hostname-or-ip>` with:
+       - `kokoro-tts` if the UI container is running in the same Docker Compose setup.
+       - `localhost` if the API is running on your local machine.
+  
 ### Disabling Local Saving
 
 You can disable local saving of audio files and hide the file view in the UI by setting the `DISABLE_LOCAL_SAVING` environment variable to `true`. This is useful when running the service on a server where you don't want to store generated audio files locally.

diff --git a/api/src/voices/af.pt b/api/src/voices/af.pt
diff --git a/api/src/voices/af_bella.pt b/api/src/voices/af_bella.pt
diff --git a/api/src/voices/af_nicole.pt b/api/src/voices/af_nicole.pt
diff --git a/api/src/voices/af_sarah.pt b/api/src/voices/af_sarah.pt
diff --git a/api/src/voices/af_sky.pt b/api/src/voices/af_sky.pt
diff --git a/api/src/voices/am_adam.pt b/api/src/voices/am_adam.pt
diff --git a/api/src/voices/am_michael.pt b/api/src/voices/am_michael.pt
diff --git a/api/src/voices/bf_emma.pt b/api/src/voices/bf_emma.pt
diff --git a/api/src/voices/bf_isabella.pt b/api/src/voices/bf_isabella.pt
diff --git a/api/src/voices/bm_george.pt b/api/src/voices/bm_george.pt
diff --git a/api/src/voices/bm_lewis.pt b/api/src/voices/bm_lewis.pt
diff --git a/docker/cpu/docker-compose.yml b/docker/cpu/docker-compose.yml
@@ -1,10 +1,10 @@
 name: kokoro-tts
 services:
   kokoro-tts:
-    image: ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0
-    # build:
-    #   context: ../..
-    #   dockerfile: docker/cpu/Dockerfile
+    # image: ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0
+    build:
+      context: ../..
+      dockerfile: docker/cpu/Dockerfile
     volumes:
       - ../../api/src:/app/api/src
       - ../../api/src/voices:/app/api/src/voices
@@ -35,3 +35,5 @@ services:
       - GRADIO_WATCH=True  # Enable hot reloading
       - PYTHONUNBUFFERED=1  # Ensure Python output is not buffered
       - DISABLE_LOCAL_SAVING=false  # Set to 'true' to disable local saving and hide file view
+      - API_HOST=kokoro-tts  # Set TTS service URL
+      - API_PORT=8880  # Set TTS service PORT
diff --git a/docker/gpu/docker-compose.yml b/docker/gpu/docker-compose.yml
@@ -1,10 +1,10 @@
 name: kokoro-tts
 services:
   kokoro-tts:
-    image: ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0
-    # build:
-    #   context: ../..
-    #   dockerfile: docker/gpu/Dockerfile
+    # image: ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0
+    build:
+      context: ../..
+      dockerfile: docker/gpu/Dockerfile
     volumes:
       - ../../api/src:/app/api/src  # Mount src for development
       - ../../api/src/voices:/app/api/src/voices  # Mount voices for persistence
@@ -35,3 +35,5 @@ services:
       - GRADIO_WATCH=1  # Enable hot reloading
       - PYTHONUNBUFFERED=1  # Ensure Python output is not buffered
       - DISABLE_LOCAL_SAVING=false  # Set to 'true' to disable local saving and hide file view
+      - API_HOST=kokoro-tts  # Set TTS service URL
+      - API_PORT=8880  # Set TTS service PORT
diff --git a/ui/Dockerfile b/ui/Dockerfile
@@ -11,5 +11,8 @@ RUN mkdir -p data/inputs data/outputs
 # Copy the application files
 COPY . .
 
+ENV API_HOST=kokoro-tts
+ENV API_PORT=8880
+
 # Run the Gradio app
 CMD ["python", "app.py"]
diff --git a/ui/lib/config.py b/ui/lib/config.py
@@ -1,7 +1,9 @@
 import os
 
 # API Configuration
-API_URL = "http://kokoro-tts:8880"
+API_HOST = os.getenv("API_HOST", "kokoro-tts")
+API_PORT = os.getenv("API_PORT", "8880")
+API_URL = f"http://{API_HOST}:{API_PORT}"
 
 # File paths
 INPUTS_DIR = "/app/ui/data/inputs"