Skip to content

Use Microsoft Edge's online text-to-speech service from JS code directly!

License

Notifications You must be signed in to change notification settings

ericc-ch/edge-tts

Repository files navigation

Edge TTS

A TypeScript library for generating speech using Microsoft Edge's text-to-speech API

Generate speech from text using Microsoft Edge's text-to-speech service. This library provides access to Edge's TTS capabilities with subtitle generation support and voice customization options.

Installation

npm install @echristian/edge-tts

CLI Usage

# List all available voices grouped by locale
npx @echristian/edge-tts voices

# Generate audio from text
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --voice en-US-AvaNeural

# Generate audio with subtitles
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --subtitle output.srt --voice en-US-AvaNeural

API Usage

import { synthesize, synthesizeStream, getVoices } from "@echristian/edge-tts";

// Get available voices
const voices = await getVoices();
console.log(voices); // Array of available voice options

// Basic usage with synthesize()
const { audio, subtitle } = await synthesize({
  text: "Hello, world!",
});

// Stream processing usage
const generator = synthesizeStream({ text: "Hello world" });
for await (const chunk of generator) {
  // chunk is a Uint8Array of raw audio data
  // Process or save each chunk as needed
}

// Collecting all streamed chunks
const chunks: Uint8Array[] = [];
for await (const chunk of synthesizeStream({ text: "Hello world" })) {
  chunks.push(chunk);
}

API

getVoices(): Promise<Array>

Returns an array of available voices with their properties.

Voice Object

Property Type Description
Name string Full name of the voice
ShortName string Short identifier for the voice
Gender string Voice gender (Male/Female)
Locale string Language code and region
FriendlyName string Display name for the voice

synthesize(options): Promise

Main function to generate speech from text.

synthesizeStream(options): AsyncGenerator

Creates an async generator that yields chunks of processed audio data. Each chunk has metadata headers automatically removed.

Uses the same options as synthesize(), but without subtitle support:

Option Type Default Description
text string (required) Text to convert to speech
voice string "en-US-AvaNeural" Voice ID to use
language string "en-US" Language code
outputFormat string "audio-24khz-96kbitrate-mono-mp3" Audio format
rate string "default" Speaking rate
pitch string "default" Voice pitch
volume string "default" Audio volume

For detailed configuration options, refer to Microsoft's documentation:

Note: Some options may be limited by Microsoft Edge's service capabilities.

GenerateOptions

Option Type Default Description
text string (required) Text to convert to speech
voice string "en-US-AvaNeural" Voice ID to use
language string "en-US" Language code
outputFormat string "audio-24khz-96kbitrate-mono-mp3" Audio format
rate string "default" Speaking rate
pitch string "default" Voice pitch
volume string "default" Audio volume
subtitle SubtitleOptions { splitBy: "word", wordsPerCue: 10 } Subtitle options

SubtitleOptions

Option Type Default Description
splitBy "word" | "duration" "word" How to split subtitles
wordsPerCue number 10 Words per subtitle when using 'word'
durationPerCue number 5000 Duration (ms) when using 'duration'

GenerateResult

Property Type Description
audio Blob Generated audio data
subtitle Array Generated subtitles

SubtitleResult

Property Type Description
text string Subtitle text
start number Start time (ms)
end number End time (ms)
duration number Duration (ms)