v0.0.76

markbackman released this 12 Jul 00:29

· 295 commits to main since this release

1836a74

Added

Added SpeechControlParamsFrame, a new SystemFrame that notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by the BaseInputTransport at Start and any time a VADParamsUpdateFrame is received.

Changed

Two package dependencies have been updated:
- numpy now supports 1.26.0 and newer
- transformers now supports 4.48.0 and newer

Fixed

Fixed an issue with RTVI's handling of append-to-context.
Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors.
Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk.
Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues.
Fixed an issue with emulated VAD timeout inconsistency in LLMUserContextAggregator. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcoded aggregation_timeout (default 0.5s) instead of matching the VAD's stop_secs parameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD's stop_secs parameter.
Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through trigger_assistant_response().

Assets 2