Replies: 1 comment 1 reply
-
where did you read that? it's the other way round, audio files longer than 30s are processed in 30s chunks. nothing changes below 30s. my dangerous half-knowledge on the subject |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello. Recently I ran a benchmark on a fine-tuned whisper-small model. Most audio files were pretty short, below 10s. The results are as follows, where the y-axis is the time to transcribe using faster-whisper, and the x-axis is the audio duration in seconds. My CPU is an AMD Ryzen 5 7600X (12) @ 5.45 GHz.
I used the following config. The model is loaded quantized (int8) and with
compute_type="int8"
.I thought that, since whisper pads audios shorter than 30s to 30s, every file would have more or less the same time to transcribe. But that's not the case. Why?
Beta Was this translation helpful? Give feedback.
All reactions