Question about Silero VAD implementation and chunking #882

ngcheeyuan · 2024-06-27T00:55:20Z

ngcheeyuan
Jun 27, 2024

I can't quite find information on how VAD is implemented in faster-whipser. Some details will help.
For example :

Do you all chop up the audio into 30 seconds chunk, run VAD on each of those chunks, remove the silence portion if it's below min silenece threshold, and transcribe them?

Or is there something similar to WhisperX being performed?

https://github.com/m-bain/whisperX

Where they merge smaller chunks before transcribing them.

Thanks.

trungkienbkhn · 2024-06-27T06:23:53Z

trungkienbkhn
Jun 27, 2024
Maintainer

@ngcheeyuan , hello. You can enable Silero VAD by setting the option vad_filter=True. This will be handled in the logic.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Silero VAD implementation and chunking #882

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Question about Silero VAD implementation and chunking #882

ngcheeyuan Jun 27, 2024

Replies: 1 comment

trungkienbkhn Jun 27, 2024 Maintainer

ngcheeyuan
Jun 27, 2024

trungkienbkhn
Jun 27, 2024
Maintainer