-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch whisper inference #1525
Comments
If you are using CPU, it won't make much difference in speed. |
If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no? |
I suggest that you have a look at the Moonshine models. It does not require padding. |
unfortunately it supports only English |
Whisper model has limitation of 30s.
Can you integrate batch inference into sherpa?
I would like to use it along with the diarization.
I'm still not sure how exactly it possible to batch it but I have some idea:
use silero-vad and aggregate segments into 30s (if there's smaller)
add silence between.
using word timestamps, estimate where's the silence added and reconstruct back the segments text.
thewh1teagle/loud.cpp#11
The text was updated successfully, but these errors were encountered: