v1.1.0 slower than v1.0.3 on CPU #1167

pablopla · 2024-11-22T12:45:04Z

I've compared the transcription speed on AMD Ryzen 5950X CPU with and without batching. faster-whisper is running inside a single docker container on a 5 minutes mp4 file. The turbo model is cached locally.
v1.1.0 is few seconds slower without batching and batching doesn't improve the speed.

Results:

Version	Precision	Beam size	Time
v1.0.3 (`cpu_threads=4`)	int8	1	55s
v1.1.0 (`cpu_threads=4`)	int8	1	1m2s
v1.1.0 (`cpu_threads=4, batch_size=4`)	int8	1	55s

Without batching:

from faster_whisper import WhisperModel

model_size = "turbo"
model = WhisperModel(model_size, device="cpu", compute_type="int8", cpu_threads=4)
segments, info = model.transcribe("test.mp4", beam_size=1, vad_filter=True, task="transcribe")
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

With batching

from faster_whisper import WhisperModel, BatchedInferencePipeline

model_size = "turbo"
model = WhisperModel(model_size, device="cpu", compute_type="int8", cpu_threads=4)
batched_model = BatchedInferencePipeline(model=model)
segments, info = batched_model.transcribe("test.mp4", beam_size=1, vad_filter=True, task="transcribe", batch_size=4)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

The text was updated successfully, but these errors were encountered:

ozancaglayan · 2025-01-09T12:44:08Z

Are you trying with VAD enabled? I wonder whether it could be enabled to this PR
https://github.com/SYSTRAN/faster-whisper/pull/1198/files

PS: Hm actually that was changed after this issue, nevermind.

DakeQQ · 2025-01-14T06:07:28Z

Feel free to reference this repository — it's designed for users with CPU-only. With an Intel i3-12300 CPU and Whisper-Large-V3-Turbo, it takes just 15 minutes to generate subtitles for a 2-hour movie. Additionally, the faster SenseVoiceSmall model can transcribe the same movie in just 7 minutes. This tool also includes a VAD and denoiser to improve subtitles accuracy.

MahmoudAshraf97 mentioned this issue Nov 25, 2024

Remove torch dependency, Faster numpy Feature extraction #1106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0 slower than v1.0.3 on CPU #1167

v1.1.0 slower than v1.0.3 on CPU #1167

pablopla commented Nov 22, 2024

ozancaglayan commented Jan 9, 2025 •

edited

Loading

DakeQQ commented Jan 14, 2025

v1.1.0 slower than v1.0.3 on CPU #1167

v1.1.0 slower than v1.0.3 on CPU #1167

Comments

pablopla commented Nov 22, 2024

ozancaglayan commented Jan 9, 2025 • edited Loading

DakeQQ commented Jan 14, 2025

ozancaglayan commented Jan 9, 2025 •

edited

Loading