Skip to content

v0.0.76

Compare
Choose a tag to compare
@markbackman markbackman released this 12 Jul 00:29
· 295 commits to main since this release
1836a74

Added

  • Added SpeechControlParamsFrame, a new SystemFrame that notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by the BaseInputTransport at Start and any time a VADParamsUpdateFrame is received.

Changed

  • Two package dependencies have been updated:
    • numpy now supports 1.26.0 and newer
    • transformers now supports 4.48.0 and newer

Fixed

  • Fixed an issue with RTVI's handling of append-to-context.

  • Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors.

  • Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk.

  • Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues.

  • Fixed an issue with emulated VAD timeout inconsistency in LLMUserContextAggregator. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcoded aggregation_timeout (default 0.5s) instead of matching the VAD's stop_secs parameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD's stop_secs parameter.

  • Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through trigger_assistant_response().