A TypeScript library for generating speech using Microsoft Edge's text-to-speech API
Generate speech from text using Microsoft Edge's text-to-speech service. This library provides access to Edge's TTS capabilities with subtitle generation support and voice customization options.
npm install @echristian/edge-tts
# List all available voices grouped by locale
npx @echristian/edge-tts voices
# Generate audio from text
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --voice en-US-AvaNeural
# Generate audio with subtitles
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --subtitle output.srt --voice en-US-AvaNeural
import { synthesize, synthesizeStream, getVoices } from "@echristian/edge-tts";
// Get available voices
const voices = await getVoices();
console.log(voices); // Array of available voice options
// Basic usage with synthesize()
const { audio, subtitle } = await synthesize({
text: "Hello, world!",
// Stream processing usage
const generator = synthesizeStream({ text: "Hello world" });
for await (const chunk of generator) {
// chunk is a Uint8Array of raw audio data
// Process or save each chunk as needed
// Collecting all streamed chunks
const chunks: Uint8Array[] = [];
for await (const chunk of synthesizeStream({ text: "Hello world" })) {
Returns an array of available voices with their properties.
Property | Type | Description |
Name | string | Full name of the voice |
ShortName | string | Short identifier for the voice |
Gender | string | Voice gender (Male/Female) |
Locale | string | Language code and region |
FriendlyName | string | Display name for the voice |
Main function to generate speech from text.
Creates an async generator that yields chunks of processed audio data. Each chunk has metadata headers automatically removed.
Uses the same options as synthesize()
, but without subtitle support:
Option | Type | Default | Description |
text | string | (required) | Text to convert to speech |
voice | string | "en-US-AvaNeural" | Voice ID to use |
language | string | "en-US" | Language code |
outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format |
rate | string | "default" | Speaking rate |
pitch | string | "default" | Voice pitch |
volume | string | "default" | Audio volume |
For detailed configuration options, refer to Microsoft's documentation:
Note: Some options may be limited by Microsoft Edge's service capabilities.
Option | Type | Default | Description |
text | string | (required) | Text to convert to speech |
voice | string | "en-US-AvaNeural" | Voice ID to use |
language | string | "en-US" | Language code |
outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format |
rate | string | "default" | Speaking rate |
pitch | string | "default" | Voice pitch |
volume | string | "default" | Audio volume |
subtitle | SubtitleOptions | { splitBy: "word", wordsPerCue: 10 } | Subtitle options |
Option | Type | Default | Description |
splitBy | "word" | "duration" | "word" | How to split subtitles |
wordsPerCue | number | 10 | Words per subtitle when using 'word' |
durationPerCue | number | 5000 | Duration (ms) when using 'duration' |
Property | Type | Description |
audio | Blob | Generated audio data |
subtitle | Array | Generated subtitles |
Property | Type | Description |
text | string | Subtitle text |
start | number | Start time (ms) |
end | number | End time (ms) |
duration | number | Duration (ms) |