This repo includes python code for running inference with the Stable Audio Open 1.0 model. This can be run locally or hosted on Modal.
generate_audio_sample.py
tweaks the provided prompt to attempt to generate a oneshot sample like a drum hit. It then applies some post processing to the model output to trim extra hits and fade out the audio smoothly.
In order to access the Stable Audio Open model, you'll need to:
- Create a Hugging Face account
- Navigate to the Stable Audio Open 1.0 model page and opt-in to gain access to the model
- Create a Hugging Face access token with read access
- Copy the token and add it to your local env using the name HF_TOKEN:
For zsh, add this to your ~/.zshrc
:
export HF_TOKEN=myhftoken
For fish, add this to your fish config (e.g. ~/.config/fish/config.fish
):
set -Ux HF_TOKEN myhftoken
-
Install miniconda: https://docs.conda.io/en/latest/miniconda.html
-
Setup the conda environment
conda env create -f environment.yml
-
Activate it
conda activate stable-audio-open-modal
To run inference locally, you can run generate_audio.py
after activating the conda environment.
For example:
python generate_audio.py --prompt "Massive metalic techno kick drum"
This will generate a file called output_0.wav
in the current directory.
To see a list of available arguments to customize inference, run:
python generate_audio.py -h"
To deploy the app to run inference on Modal, you'll need to:
-
Create a Modal account
-
Create a Hugging Face account and API token.
-
Sign the agreement to use the Stable Audio Open 1.0 model.
-
Setup secrets for the Modal app with the following environment variables:
HF_TOKEN
: Your Hugging Face API tokenAUTH_TOKEN
: A Bearer auth token you create to authenticate requests to the Modal app
-
Deploy the app with the following command:
modal deploy src/api.py
Note, you can test the endpoint prior to deploying with the following command:
modal serve src/api.py
And hit the endpoint with a POST request locally. This assumes you have set the AUTH_TOKEN
environment variable.
curl -X POST https://your-modal-endpoint.modal.run \
-H "Authorization: Bearer "$AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Dub techno snare"
}' --output "modal-out.wav"