Closed
Description
Hi, I'm using whisper.cpp in conjunction with whisper-api
. I've tried the new VAD feature introduced into the server component of whisper.cpp by @danbev, and while it works well with samples that contain voice, there is an issue with audio that contains no voice.
It seems that if no voice is detected in the sample, the server just returns transcript of the last voice sample that was successfully transcribed
For example:
whisper-cpp-1 | whisper_vad_segments_from_probs: detecting speech timestamps using 145 probabilities
whisper-cpp-1 | whisper_vad_segments_from_probs: Final speech segments after filtering: 0
wyoming-api-1 | INFO:httpx:HTTP Request: POST http://whispercpp:8910/inference?temperature=0.0&temperature_inc=0.2&response_format=json "HTTP/1.1 200 OK"
wyoming-api-1 | INFO:wyoming_whisper_api_client.handler: set a timer for 30 minutes
wyoming-api-1 |
The behavior I would expect instead is for the transcript to be empty, since no voice was detected.
Metadata
Metadata
Assignees
Labels
No labels