Concurrent requests #406
Replies: 2 comments 1 reply
-
Hi, See the section "Running multiple transcriptions in parallel" in #100 (comment) which shows how to initialize multiple model workers that can process requests in parallel. |
Beta Was this translation helpful? Give feedback.
-
@guillaumekln I have taken an instance of 2 GPU and set device_index=[0,1] and num_workers=2, then it should handle 4 req at a time and yes it handles but in best senario, but the problem is the overall time taken by the app is same when I run it on two instance of single gpu with gunicorn, the api is made using flask app. I have observed that the initially it takes time to give response of first request thats why the time has no defference. Please let me clarify . Thanks and regards |
Beta Was this translation helpful? Give feedback.
-
Hi all,
Is there any bench mark/literature (for me to understand) on the number of simultaneous (concurrent) requests we can process on faster-whisper (especially using GPUs). In my use-case, I will be typically sending short sentences/phrases.
What parameter will let me control the simultaneous/concurrent use.
cheers,
Beta Was this translation helpful? Give feedback.
All reactions