-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster-whisper integration ? #6816
Comments
How faster-whisper speed [on GPU] compares to whisper-ConstMe? |
I asked about whisper-ConstMe, not "openai/whisper". whisper-ConstMe can use any model. |
full benchmarks : faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. Benchmark openai/whisper@6dea21fd Large-v2 model on GPU
Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S. Small model on CPU
Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R. |
If it ran for me I wouldn't ask. |
because this would help the software ? why would we not integrate faster-whisper ? why are you so adversarial to contribution on an open-source project ? |
I asked a question, you answer with some irrelevant posts. I'm adversarial to nonsense... |
I think const-me/whisper is a Windows port of the whisper.cpp implementation. |
Are you a GPT-3 bot tuned on 4chan? |
at least 5x faster on CPU and 10x faster if you use GPU |
So, few minutes ago you didn't knew what whisper-ConstMe is, and now you are posting "benchmarks" out of you ass...
If you are not a bot then clearly with some mental deficiency. |
I see... Take ten deep breaths and no need to type more posts. |
I'm not sure why this post devolved into insults instead of mutual understanding. whisper-ConstMe is a GPU-enabled implementation of Whisper. Does faster-whisper provide any benefits in terms of speed or accuracy or GPU ram usage compared to it? ConstMe is already integrated into SubtitleEdit which is why the question is relevant. |
Yes it does go about 5x faster with the optimizations they provided, that is what the benchmark I posted show, they are on the faster-whisper GitHub. You can also use Whisper-cTranslate2 directly |
we also save a lot of VRAM which means we can run large-v2 on 4gb GPUs |
maybe for english medium is fine but for multilingual large-v2 is a lot more useful than having to download a specific model for each language |
Const-Me also has huge speed boosts over CPU-only implementations. I'll
assume ConstMe and Faster Whisper are comparable unless someone reports
data to the contrary.
|
can you provide a benchmark that shows this ? |
I am talking about 5x speed on GPU vs GPU, not cpu vs gpu |
Const-me is a GPU implementation of CPP that is much faster.
David M's experience here:
https://www.youtube.com/watch?v=RRF5AS6JVtI&list=PLG8jlFKr-RtdO_r3YAp9cncEEqJRkIltB&index=83
I used CPP on my GTX 1050/Intel i7-7700HQ laptop on a 2:35 minute file in
13 seconds (tiny.en model).
ConstMe was about 5 seconds
With the Base model and the same 2:35 file
Const-me. 8 seconds
CPP: 22 seconds
OpenAI (Python) no GPU: 52 seconds.
Larger models may show more difference but my GPU only has 4GB ram.
I don't see a need to test Faster Whisper unless it gets embedded with
SubtitleEdit. Feel free to test it and report results.
…On Wed, Apr 12, 2023 at 9:59 PM ras0k ***@***.***> wrote:
Const-Me also has huge speed boosts over CPU-only implementations. I'll
assume ConstMe and Faster Whisper are comparable unless someone reports
data to the contrary.
Const-Me is whisper.cpp which is a CPU-only implementation, no ?
whiper.cpp is in the benchmark
—
Reply to this email directly, view it on GitHub
<#6816 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5EWOZOMNH6M3F4B3C4V3V3XA2RKPANCNFSM6AAAAAAW2Y4LVI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I understand and respect your desire to not ponder on this but for me a tiny benchmark is completely useless, i am only talking about comparing whisper (in gpu mode) and faster-whisper (also gpu mode) on large-v2 because I believe there will be a use-case for a lot of users. I will do my best to provide the benchmarks that you asked me soon. |
you can already try large-v2 on faster-whisper with your GPU, is that not incentive enough to want it ? |
You are the one who wants this implementation of Whisper to be included yet
you haven't provided any test data to show how it is better than the
current options.
I care about workflow, not absolute speed. If it's not in SubtitleEdit or
integrated into my NLE I don't have any reason to use it as it will slow me
down.
How much VRAM do you need for the large v2 model in Faster Whisper? That
may limit its interest to users.
…On Wed, Apr 12, 2023 at 11:36 PM ras0k ***@***.***> wrote:
I understand and respect your desire to not ponder on this but for me a
tiny benchmark is completely useless, i am only talking about comparing
whisper (in gpu mode) and faster-whisper (also gpu mode) on large-v2
because I believe there will be a use-case for a lot of users. I will do my
best to provide the benchmarks that you asked me soon.
—
Reply to this email directly, view it on GitHub
<#6816 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5EWOZO423QTCNZ23RHBJ3TXA24VNANCNFSM6AAAAAAW2Y4LVI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
3.09 GB |
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. I posted the full benchmarks up there in my first reply but I will also try subtitleEdit and post my results soon. |
Is that the needed VRAM or the model size? I can't run the medium model
(1.5gb) on my 4GB GPU FWIW.
…On Wed, Apr 12, 2023 at 11:41 PM ras0k ***@***.***> wrote:
How much VRAM do you need for the large v2 model in Faster Whisper? That
may limit its interest to users.
3.09 GB
https://huggingface.co/guillaumekln/faster-whisper-large-v2/tree/main
—
Reply to this email directly, view it on GitHub
<#6816 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5EWOZKP5CJG2Z4MK6YJN2DXA25KLANCNFSM6AAAAAAW2Y4LVI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
oh sorry yes model size, I am not sure about VRAM use I will try right now but if you take the time to read the benchmarks I posted they say 4.8gb or 3.1gb depending on fp16 or int8 Large-v2 model on GPU
Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S. |
just FYI right now I am testing large model on contMe and it's about 4.2GB so medium should run on your 4GB gpu, medium shows as about 2.3 GB usage max |
my GPU is a 2060 6GB |
for english medium.en is fine but for french not even large works so I really need large-v2 for it's multilingual capacities |
do you want me to compare the speed of ConstMe vs Faster-Whisper on large just for benchmarking purposes ? |
Nice! I would be interested in tests on longer sample and |
Btw, by default it sets threads to max real cores, it's probably not healthy on CPUs with a lot of cores. |
Longer sample? It was about a 90 minute source video.
Yes, for a few short samples I did large does much better with proper
names, especially unusual ones.
…On Fri, Apr 21, 2023 at 1:01 PM Purfview ***@***.***> wrote:
1:30 min transcription about 4 minutes vs 7 or so for Const-me using large
v2 for both. I tested it through SubtitleEdit beta, not from the command
line.
Edit: and Japanese works!
Nice! I would be interested in tests on longer sample and medium model.
Btw, do you find results from large-v2 model valuably better in
comparison to medium?
—
Reply to this email directly, view it on GitHub
<#6816 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5EWOZI7YAXKDFPOF6BJYZTXCIBBTANCNFSM6AAAAAAW2Y4LVI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Oh, I misread it. That's impressive performance you have there. :) |
Yes, it was surprisingly quick! The RTX2080 Super was running at about
100%.
For CPU I need to disable CUDA? Can you give me a command line setting to
do what you need re: cores and threads?
I have a 14 core CPU (13600K) and don't mind running it at 100%
(I was using Subtitle Edit with CPP and running 3 at a time to max out the
CPU- about 200W for ~10 hours at a time. No overheating).
…On Fri, Apr 21, 2023 at 1:08 PM Purfview ***@***.***> wrote:
Longer sample? It was about a 90 minute source video. Yes, for a few short
samples I did large does much better with proper names, especially unusual
ones.
Oh, I misread it. That's impressive performance you have there. :)
—
Reply to this email directly, view it on GitHub
<#6816 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5EWOZPSK5QSG5J3MJM4O7LXCIB2ZANCNFSM6AAAAAAW2Y4LVI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes use CPU, and test EDIT: BTW: |
Interesting.... Please do 2 threads test. |
Is this necessary? I've done quite a lot of tests. It will probably yield a worse time.. |
Yes, it is. |
2 threads. First run: 472s; second run 477s |
Thanks, looks like I need to set default to max 4 threads. |
Why limit it at all? |
Because it's a waste of electricity for nothing. |
FWIW, over here, on a file that's just shy of three minutes (2:58), in Hebrew, with the large model: Though I will say that Faster-Whisper was more accurate in three or four words out of that file. |
@damn WhisperDesktop is Const-me, it runs on GPU. |
Oh, yes, I didn't mean to suggest otherwise, it's just that the last time I tried Whisper-Faster, it wouldn't run the large model on the GPU at all, because I didn't have the 10 GB of memory I needed. But I just tried again with your standalone version, and it does indeed run it. I tried the default, 8 and 16 threads, and with all of them the results were 78-83 seconds. The output was still a little bit more accurate than Whisper-Desktop, but, strangely, a little less accurate than with your CPU build. |
A waste of electricity ? where is the electricity wasted ? either it is computing or it is not, there is no ‘‘waste’’
From: ***@***.***>
Sent: April 21, 2023 5:42 AM
To: ***@***.***>
Cc: ***@***.***>; ***@***.***>
Subject: Re: [SubtitleEdit/subtitleedit] faster-whisper integration ? (Issue #6816)
Because it's a waste of electricity for nothing.
—
Reply to this email directly, view it on GitHub<#6816 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG2HNCOOA67DE2XFPNCSVZDXCJJBHANCNFSM6AAAAAAW2Y4LVI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Thanks for tests. I think that then you've tested "OpenAI standalone".
Some difference between GPU<->CPU results is normal.
Yes. In CPU. That's not how multi-core CPUs work. |
With the latest GPU release, with the default settings, it processes the same file in 55 seconds! So slightly faster than Whisper-Desktop now. Now, the question is, can I tweak any of the settings that would make it more accurate? I've never tried messing with any of these at all, and so I don't even really know what they are (beam size? something?), but as I said before, the CPU's output was slightly more accurate with its default settings. |
You can try to increase And you can try |
In plain english (im sure if you ask chatgpt t will come up with a better explanation but i am lazy soiwill just type her einstead) :
beam size means how large we are analysing so beamof 1 means we (by we i mean the whisper software) are reading one word at a time. With a beam size of 5 or 10 fr example we are reading and after every word we anlyse the past 5-10 words for larger context. This can lead to more accuracy if the content is predictable and coherent but less accuracy if the single words dont make global sense. Feel free to experiment, as stated before, larger beam = slower processing.
From: ***@***.***>
Sent: April 27, 2023 11:36 AM
To: ***@***.***>
Cc: ***@***.***>; ***@***.***>
Subject: Re: [SubtitleEdit/subtitleedit] faster-whisper integration ? (Issue #6816)
With the latest GPU release, with the default settings, it processes the same file in 55 seconds! So slightly faster than Whisper-Desktop now. Now, the question is, can I tweak any of the settings that would make it more accurate? I've never tried messing with any of these at all, and so I don't even really know what they are (beam size? something?), but as I said before, the CPU's output was slightly more accurate with its default settings.
—
Reply to this email directly, view it on GitHub<#6816 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG2HNCORCDIEHIJUMP4U77LXDKHAVANCNFSM6AAAAAAW2Y4LVI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@darnn @rsmith02ct @ras0k Could you test if it works OK on CUDA? Speed should similar, maybe slower than previous version. |
Sounds interesting! I won't have time to test thoroughly in the next two or three days, but I will after that! |
Could you share a link to the release so I don't have to look for it? |
Well, huh! I still haven't messed with all the different settings (is there anything other than beam_size that might improve accuracy?), but with the default settings, running through CUDA, it gives me the exact same time WhisperDesktop does for the file I used, 56 seconds. |
Retesting with the 17 min file I tested above From command line gives an error but then proceeds --language en --model "medium" --device CPU --no_speech_threshold 0.2 --language en --model "medium" --device CUDA --no_speech_threshold 0.2 So basically as fast as before Through SubtitleEdit beta "ctranslate2" engine |
Have you tested WhisperX? I can't actually install figure out how it (too complicated) but interested to see if it's got better timestamps |
I did not read the whole thread about whisper GPU but can we avoid a lot of problems with VRAM and speed by switching to faster-whisper maybe ?
The text was updated successfully, but these errors were encountered: