Gemini Realtime API with WebRTC, Like OpenAI Realtime API with WebRTC.
- Real-time voice communication with Gemini AI
- High-quality audio processing:
- 48kHz sample rate support
- Opus codec for efficient audio compression
- Automatic audio resampling
- Smart audio buffering with 100ms accumulation
- WebRTC-based communication:
- Low-latency audio streaming
- Reliable data channel for text messages
- Debug capabilities:
- Configurable audio dumping for all streams
- Detailed logging
- PCM file format support
- Go 1.21 or higher
- FFmpeg libraries (for audio processing)
- Opus codec library
- Google API Key for Gemini AI
- Install system dependencies:
# For Debian/Ubuntu
apt-get install pkg-config libopus-dev libavcodec-dev libavformat-dev libavutil-dev libswresample-dev
# For macOS
brew install opus ffmpeg
- Clone the repository:
git clone https://github.com/realtime-ai/gemini-realtime-webrtc.git
cd gemini-realtime-webrtc
- Install Go dependencies:
go mod download
- Set up environment variables:
# Required
export GOOGLE_API_KEY=your_api_key_here
# Optional (for audio debugging)
export DUMP_SESSION_AUDIO=true # Dump AI response audio
export DUMP_REMOTE_AUDIO=true # Dump user input audio
export DUMP_LOCAL_AUDIO=true # Dump playback audio
- Start the server:
go run main.go
- Open the web client:
- Navigate to
tests/gemini_realtime_webrtc.html
in your browser - Click "Connect" to establish WebRTC connection
- Allow microphone access when prompted
- Navigate to
pkg/gateway
: WebRTC server and connection managementpkg/audio
: Audio processing utilities- Resampling between different sample rates
- Audio buffering with smart accumulation
- PCM/WAV file handling
pkg/utils
: Common utilities and helper functions
go build -o server
go test ./...
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request