Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch whisper inference #1525

Open
thewh1teagle opened this issue Nov 11, 2024 · 5 comments
Open

Batch whisper inference #1525

thewh1teagle opened this issue Nov 11, 2024 · 5 comments

Comments

@thewh1teagle
Copy link
Contributor

Whisper model has limitation of 30s.
Can you integrate batch inference into sherpa?
I would like to use it along with the diarization.

I'm still not sure how exactly it possible to batch it but I have some idea:
use silero-vad and aggregate segments into 30s (if there's smaller)
add silence between.
using word timestamps, estimate where's the silence added and reconstruct back the segments text.

thewh1teagle/loud.cpp#11

@thewh1teagle
Copy link
Contributor Author

@csukuangfj
Copy link
Collaborator

If you are using CPU, it won't make much difference in speed.

@thewh1teagle
Copy link
Contributor Author

If you are using CPU, it won't make much difference in speed.

If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no?
Also GPU is very important with whisper because it's heavy model and that makes much difference

@csukuangfj
Copy link
Collaborator

If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no?

I suggest that you have a look at the Moonshine models. It does not require padding.

@thewh1teagle
Copy link
Contributor Author

I suggest that you have a look at the Moonshine models. It does not require padding.

unfortunately it supports only English

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants