Skip to content

[vad] Previous response is returned by server if no voice activity is detected in the sample #3250

Closed
@tannisroot

Description

@tannisroot

Hi, I'm using whisper.cpp in conjunction with whisper-api. I've tried the new VAD feature introduced into the server component of whisper.cpp by @danbev, and while it works well with samples that contain voice, there is an issue with audio that contains no voice.
It seems that if no voice is detected in the sample, the server just returns transcript of the last voice sample that was successfully transcribed
For example:

whisper-cpp-1  | whisper_vad_segments_from_probs: detecting speech timestamps using 145 probabilities
whisper-cpp-1  | whisper_vad_segments_from_probs: Final speech segments after filtering: 0
wyoming-api-1  | INFO:httpx:HTTP Request: POST http://whispercpp:8910/inference?temperature=0.0&temperature_inc=0.2&response_format=json "HTTP/1.1 200 OK"
wyoming-api-1  | INFO:wyoming_whisper_api_client.handler: set a timer for 30 minutes
wyoming-api-1  | 

The behavior I would expect instead is for the transcript to be empty, since no voice was detected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions