Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I play a pre-recorded message in the entrypoint? #359

Open
ylhan opened this issue Feb 11, 2025 · 1 comment
Open

How do I play a pre-recorded message in the entrypoint? #359

ylhan opened this issue Feb 11, 2025 · 1 comment

Comments

@ylhan
Copy link

ylhan commented Feb 11, 2025

Based on the examples, it's normal to have a greeting at the end of the entrypoint. Something like:

await agent.say("Welcome, I'm a friendly assistant...", allow_interruptions=True)

This message is repetitive and re-generating($$$) it every time is just burning tokens for no good reason. How do I play a pre-recorded message from the agent here?


I think I can reverse engineer VoicePipelineAgent.say(...) and inject a wav there but I'm curious if there's an easier way.

@ylhan
Copy link
Author

ylhan commented Feb 11, 2025

thanks claude - this is ugly but it works.

async def play_greeting_file(local_participant, wav_path: str = "greeting.wav"):
    # Read WAV file
    with wave.open(wav_path, 'rb') as wav_file:
        # Get wav file properties
        sample_rate = wav_file.getframerate()
        num_channels = wav_file.getnchannels()
        sample_width = wav_file.getsampwidth()
        
        print(f"Audio properties: rate={sample_rate}, channels={num_channels}, width={sample_width}")
        
        # Create audio source with matching parameters
        audio_source = AudioSource(
            sample_rate=sample_rate,
            num_channels=num_channels,
            queue_size_ms=5000  # 5 second buffer
        )
        
        # Create and publish track
        track = LocalAudioTrack.create_audio_track("greeting", audio_source)
        await local_participant.publish_track(track)
        
        # Add a small delay to ensure everything is ready
        await asyncio.sleep(0.5)
        
        # Read and send audio data
        chunk_size = sample_rate // 10  # 100ms chunks
        while True:
            raw_data = wav_file.readframes(chunk_size)
            if not raw_data:
                break
                
            # Just pass the raw PCM data
            samples = np.frombuffer(raw_data, dtype=np.int16)
            
            frame = AudioFrame(
                data=raw_data,
                sample_rate=sample_rate,
                num_channels=num_channels,
                samples_per_channel=len(samples) // num_channels
            )
            
            await audio_source.capture_frame(frame)
        
        # Wait for audio to finish playing
        await audio_source.wait_for_playout()
        
        # Cleanup
        await audio_source.aclose()
        await local_participant.unpublish_track(track.sid)
    await play_greeting_file(ctx.room.local_participant)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant