Batch whisper inference #1525

thewh1teagle · 2024-11-11T01:19:16Z

Whisper model has limitation of 30s.
Can you integrate batch inference into sherpa?
I would like to use it along with the diarization.

I'm still not sure how exactly it possible to batch it but I have some idea:
use silero-vad and aggregate segments into 30s (if there's smaller)
add silence between.
using word timestamps, estimate where's the silence added and reconstruct back the segments text.

thewh1teagle/loud.cpp#11

thewh1teagle · 2024-11-12T00:15:52Z

https://github.com/m-bain/whisperX?tab=readme-ov-file#whisperx

csukuangfj · 2024-11-13T13:29:27Z

If you are using CPU， it won't make much difference in speed.

thewh1teagle · 2024-11-13T13:54:11Z

If you are using CPU， it won't make much difference in speed.

If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no?
Also GPU is very important with whisper because it's heavy model and that makes much difference

csukuangfj · 2024-11-13T13:56:03Z

If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no?

I suggest that you have a look at the Moonshine models. It does not require padding.

thewh1teagle · 2024-11-13T13:59:56Z

I suggest that you have a look at the Moonshine models. It does not require padding.

unfortunately it supports only English

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch whisper inference #1525

Batch whisper inference #1525

thewh1teagle commented Nov 11, 2024

thewh1teagle commented Nov 12, 2024

csukuangfj commented Nov 13, 2024

thewh1teagle commented Nov 13, 2024

csukuangfj commented Nov 13, 2024

thewh1teagle commented Nov 13, 2024

Batch whisper inference #1525

Batch whisper inference #1525

Comments

thewh1teagle commented Nov 11, 2024

thewh1teagle commented Nov 12, 2024

csukuangfj commented Nov 13, 2024

thewh1teagle commented Nov 13, 2024

csukuangfj commented Nov 13, 2024

thewh1teagle commented Nov 13, 2024