-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added multiprocessing for cpu processing #648
Conversation
dd68247
to
47e14c8
Compare
Does this have any actual impact on performance? Do you have benchmarks? |
Yes! I can send my data and test case later today.
…On Fri, Jan 19, 2024 at 1:15 AM Purfview ***@***.***> wrote:
Does this have any actual impact on performance? Do you have benchmarks?
—
Reply to this email directly, view it on GitHub
<#648 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXW5QF2CYM2IVG5IA22GNLYPI2TBAVCNFSM6AAAAABCBKWDXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGAZTGNJZGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Testing code: `from faster_whisper import WhisperModel, decode_audio import nvtx def preprocess_audio(filename): model = WhisperModel("large-v3", device="cuda", device_index=[0], compute_type="bfloat16", cpu_threads=2, num_workers=2) def transcribe(model_to_use):
#this is to clear out memory from the GPUs if name == "main":
` Results: Overall time to pre-process 20 requests before without multicore: 2.7506766319274902 seconds Overall time to pre-process 20 requests before with multicore: 1.9269721508026123 seconds Now to test the overhead for a single request. Overall time to pre-process 1 requests before without multicore: 0.21215391159057617 seconds Overall time to pre-process 1 request with multicore: So there's a tradeoff between overhead and spawning the worker process |
@joiemoie , hello. Tks for an interesting pull request.
That's a pretty significant improvement ! if not isinstance(audio, np.ndarray):
audio = decode_audio(
audio, sampling_rate=feature_extractor.sampling_rate
)
if vad_filter:
if vad_parameters is None:
vad_parameters = VadOptions()
elif isinstance(vad_parameters, dict):
vad_parameters = VadOptions(**vad_parameters) The overall time was 9.633s after my change. I think the logic in the |
Nice! That's not a bad idea. Please don't merge this in for now. I noticed that there's memory inefficiency, and the pool size needs to be capped or have a parameter set. I'm investigating the memory inefficiency |
@joiemoie , hello. Have you finished your work yet 😃 ? |
@@ -264,56 +317,43 @@ def transcribe( | |||
https://github.com/snakers4/silero-vad. | |||
vad_parameters: Dictionary of Silero VAD parameters or VadOptions class (see available | |||
parameters and default values in the class `VadOptions`). | |||
preprocess_on_multiple_cores: If preprocess_on_multiple_cores is True, multiple | |||
CPU based workloads will run on different cores. This will slightly increse overhead | |||
for single requests but improve performance for multiple simulatenous requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo ^_^ (all looks very interesting!)
for single requests but improve performance for multiple simulatenous requests. | |
for single requests but improve performance for multiple simultaneous requests. |
Because of the Python GIL, the preprocessing doesn't fully efficiently use all the CPU cores. By spawning the CPU tasks in its own multiprocess, you can get requests that happen on different threads to fully utilize the CPU cores.