Skip to content

Python scripts for running stable audio open inference locally and on Modal

Notifications You must be signed in to change notification settings

steve-mackinnon/stable-audio-open-modal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1472747 · Jan 6, 2025

History

16 Commits
Jan 6, 2025
Nov 18, 2024
Jan 6, 2025
Jan 6, 2025

Repository files navigation

Stable Audio Open Modal

This repo includes python code for running inference with the Stable Audio Open 1.0 model. This can be run locally or hosted on Modal.

generate_audio_sample.py tweaks the provided prompt to attempt to generate a oneshot sample like a drum hit. It then applies some post processing to the model output to trim extra hits and fade out the audio smoothly.

Hugging Face setup

In order to access the Stable Audio Open model, you'll need to:

  1. Create a Hugging Face account
  2. Navigate to the Stable Audio Open 1.0 model page and opt-in to gain access to the model
  3. Create a Hugging Face access token with read access
  4. Copy the token and add it to your local env using the name HF_TOKEN:

For zsh, add this to your ~/.zshrc:

export HF_TOKEN=myhftoken

For fish, add this to your fish config (e.g. ~/.config/fish/config.fish):

set -Ux HF_TOKEN myhftoken

Local environment setup

  1. Install miniconda: https://docs.conda.io/en/latest/miniconda.html

  2. Setup the conda environment

    conda env create -f environment.yml
  3. Activate it

    conda activate stable-audio-open-modal

Running locally

To run inference locally, you can run generate_audio.py after activating the conda environment.

For example:

python generate_audio.py --prompt "Massive metalic techno kick drum"

This will generate a file called output_0.wav in the current directory.

To see a list of available arguments to customize inference, run:

python generate_audio.py -h"

Running on Modal

To deploy the app to run inference on Modal, you'll need to:

  1. Create a Modal account

  2. Create a Hugging Face account and API token.

  3. Sign the agreement to use the Stable Audio Open 1.0 model.

  4. Setup secrets for the Modal app with the following environment variables:

    • HF_TOKEN: Your Hugging Face API token
    • AUTH_TOKEN: A Bearer auth token you create to authenticate requests to the Modal app
  5. Deploy the app with the following command:

    modal deploy src/api.py

Note, you can test the endpoint prior to deploying with the following command:

modal serve src/api.py

And hit the endpoint with a POST request locally. This assumes you have set the AUTH_TOKEN environment variable.

curl -X POST https://your-modal-endpoint.modal.run \
  -H "Authorization: Bearer "$AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Dub techno snare"
  }' --output "modal-out.wav"

About

Python scripts for running stable audio open inference locally and on Modal

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages