v0.0.76
Added
- Added
SpeechControlParamsFrame
, a newSystemFrame
that notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by theBaseInputTransport
at Start and any time aVADParamsUpdateFrame
is received.
Changed
- Two package dependencies have been updated:
numpy
now supports 1.26.0 and newertransformers
now supports 4.48.0 and newer
Fixed
-
Fixed an issue with RTVI's handling of
append-to-context
. -
Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors.
-
Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk.
-
Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues.
-
Fixed an issue with emulated VAD timeout inconsistency in
LLMUserContextAggregator
. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcodedaggregation_timeout
(default 0.5s) instead of matching the VAD'sstop_secs
parameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD'sstop_secs
parameter. -
Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through
trigger_assistant_response()
.