speechToSpeechLLM

A free, open-source implementation of Speech-to-Speech technology

To run the composite backend of

Kobold CPP (NeuralBeagle 7B) on port 5001
Coqui TTS on port 5002
WhisperCPP on port 8080

Run

chmod 555 run_entire_build.sh
./run_entire_build.sh

To stop

chmod 555 prune_entire_build.sh
./prune_entire_build.sh

The user-facing application right now is a POC, just a simple Rshiny app that interfaces between the backends. It is built for MacOS right now as it considers the inbuilt 'rec' command to record audio input.

A simple port can be modified for Windows using a software like ffmpeg. Still tbd for linux audio device recording.

All APIs run independently of the Rshiny app, which is NOT packaged with the docker compose build. Simply install R and the dependencies listed in /rshiny_deps Dockerfile to set up the environment for the front end. This is more a philosophical interlay of technologies than a true working POC

Initial Greeting speech input

Follow up Message speech input

NOTE: The only part of this build that seems to need a bit of troubleshooting is the Coqui image. if you have any latency issues when installing, feel free to use the build_coqui.sh script on its own to isolate the build. Hopefully we can fix this in a future build. Once you get the image built with the English model it should run no problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

speechToSpeechLLM

Files

README.md

Latest commit

History

README.md

File metadata and controls

speechToSpeechLLM