Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help for installing phyton component for Ctranslate2 and WhisperX #6914

Closed
Cyberyoda1411 opened this issue May 9, 2023 · 64 comments
Closed

Comments

@Cyberyoda1411
Copy link

I have a big problem installing everything that is needed for the options Audio to text (whisper), engines Ctranslate2 and whisperX. I didn't try OpenAI, but I think it would be the same problem: a mix of installation, error messages, and insisting to put everything in PATH.

I go to the page: https://github.com/guillaumekln/faster-whisper (for Ctranslate2)
and I use the command: pip install faster-whisper
Then, when I want to use module Ctranslate2, the window asks me to find the file: whisper-ctranslate2.exe and it doesn't exist.

The same is with WhisperX at: https://github.com/m-bain/whisperX
Then I use the command: pip install git+https://github.com/m-bain/whisperx.git and I get two error messages. Some are related to the unknown command git. And I can't find the whisperx.exe file that SE wants me to find.

Anyway, I get a lot of 'you need to make a PATH', and error messages. Something is installed on my computer who knows where.

Can you help me to resolve this and use these two engines? Maybe you can explain to me how to use OpenAI (how to configure it). And just maybe, SE can do it for us, because it is more than complicated to me. And I am not so timid, but I can't find the way to get out of this.

A BUG: I accidentally made the window that asks for whisper-ctranslate2.exe and whisperx.exe location and I can't bring it back to normal, it is GIGANTIC. I can see the handles to make it smaller, but nothing happens when I try to resize these dialogue boxes.

@niksedk
Copy link
Member

niksedk commented May 10, 2023

Try reading more here: https://www.nikse.dk/subtitleedit/help#audio_to_text_whisper
(you might need to refresh the page by doing a Ctrl+F5)

First uninstall Python - then re-install Python version 3.10: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe

Check the "Add to path check box" during installation:
image

Open a commend prompt, and type pip install --upgrade pip

To install the latest CTranslate2 (Faster Whisper) do a pip install git+https://github.com/jordimas/whisper-ctranslate2.git

To install latest Open AI Whisper do a pip install git+https://github.com/openai/whisper.git

Do they work from SE now?
If not, do post error_log.txt and/or any installation errors.

Note: Whisper requires a new CPU with AVX 2 and having 16GB RAM or more is required for the large models.

@Cyberyoda1411
Copy link
Author

I still get the error message related to git:

Screenshot_1

@niksedk
Copy link
Member

niksedk commented May 10, 2023

Ah, you also need Git for Windows: https://gitforwindows.org/

@Cyberyoda1411
Copy link
Author

Thank you. All works fine. OpenAI, CTranslate2, WhisperX. GIT for windows was missing.

@despairTK
Copy link

despairTK commented May 14, 2023

I installed CTranslate2 and downloaded the corresponding model, but when I choose CTranslate2 to run, the GPU does not work or directly generates blank text. But OpenAI works fine. There was no error when installing CTranslate2, and the installation was very smooth.

QQ截图20230514195202

@niksedk
Copy link
Member

niksedk commented May 14, 2023

@despairTK: Does the small model work any better? What OS are you on?

@despairTK
Copy link

:小模型效果更好吗?您使用什么操作系统?

All models are the same result, my OS is windows 11

@Purfview
Copy link
Contributor

@despairTK Did you tried standalone whisper binaries?

@kingchobo10
Copy link

I too am getting no gpu usage in whisperx and ctran, and only blank subtitle files are created. I am using windows 11

@Purfview
Copy link
Contributor

@kingchobo10 Did you tried standalone whisper binaries? https://github.com/Purfview/whisper-standalone-win

@despairTK
Copy link

您是否尝试过独立的耳语二进制文件?
QQ截图20230515095811
Standalone binaries can run

@Purfview
Copy link
Contributor

Purfview commented May 15, 2023

No, that is not a standalone binary in your screenshot.

@despairTK
Copy link

@despairTK: Does the small model work any better? What OS are you on?

I read the Issues of Ctranslate2 and found the problem. When I set --device cpu, Ctranslate2 can output SRT files and output normal text normally, but when --device cuda, it will not output any SRT files or text.

I provided two screenshots. It seems that the original problem has not been resolved?
Softcatala/whisper-ctranslate2#11

1
2

@kingchobo10
Copy link

@kingchobo10 Did you tried standalone whisper binaries? https://github.com/Purfview/whisper-standalone-win

제목 없음

I installed CUDA as instructed and ran the standalone binary, but got an error message

@kingchobo10
Copy link

@kingchobo10 Did you tried standalone whisper binaries? https://github.com/Purfview/whisper-standalone-win

제목 없음

I installed CUDA as instructed and ran the standalone binary, but got an error message

제목 없음

I used this

@Purfview
Copy link
Contributor

Maybe you need to restart Windows.

@kingchobo10
Copy link

Maybe you need to restart Windows.Windows(윈도우)를 다시 시작해야 할 수도 있습니다.

Maybe you need to restart Windows.Windows(윈도우)를 다시 시작해야 할 수도 있습니다.

I've already restarted Windows and in the old version of normal whisper mode, I could use the GPU.

@Purfview
Copy link
Contributor

I think path should be in "PATH", not "CUDA_PATH....".

You can try previous "b103" release.

@kingchobo10
Copy link

I think path should be in "PATH", not "CUDA_PATH....".경로는 "CUDA_PATH..."가 아니라 "PATH"에 있어야 한다고 생각합니다.

You can try previous "b103" release.이전 "b103" 릴리스를 사용해 볼 수 있습니다.

Thank you, I'll give it a try

@kingchobo10
Copy link

I think path should be in "PATH", not "CUDA_PATH....".경로는 "CUDA_PATH..."가 아니라 "PATH"에 있어야 한다고 생각합니다.

You can try previous "b103" release.이전 "b103" 릴리스를 사용해 볼 수 있습니다.
1

I think path should be in "PATH", not "CUDA_PATH....".

You can try previous "b103" release.

1

GPU RAM is used up to 1.7 GB. So far, subtitle edit behaves exactly the same. However, SUBTITLE EDIT only creates an empty subtitle file after that and finishes. With the standalone binary, however, it uses up to 3GB and works. I think it's because of the error message, it only works when I hit enter after some time has passed since that error message appeared. 2023-05-15 15:11:56.0324187 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.
The standalone binary works fine, but GPU utilization is low.

2

@Purfview
Copy link
Contributor

I think it's because of the error message.

It's not the error message.

it only works when I hit enter

No, your hitting of Enter is meaningless.

GPU utilization is low

Looks normal.

@despairTK
Copy link

despairTK commented May 15, 2023

不,这不是屏幕截图中的独立二进制文件。

I solved https://github.com/Purfview/whisper-standalone-win runtime error by searching the issue. Thank you so much

@Purfview
Copy link
Contributor

I solved https://github.com/Purfview/whisper-standalone-win runtime error by searching the issue. Thank you so much

What error and how you solved it?

That first screen was download error probably because your firewall blocked the internet connection.
The second screen shows that cuDNN 8.x lib is not found.

@despairTK
Copy link

I have solved these errors through Google, but when I use it in Subtitle Edit, I feel that the transcription speed is similar to that of OpenAI\Whisper. It may be caused by the fact that I select each line of subtitles for transcription. Looking forward to the follow-up updates of Faster-Whisper.

@Purfview
Copy link
Contributor

I have solved these errors through Google

But you didn't wrote how actually you solved them.

It may be caused by the fact that I select each line of subtitles for transcription.

I don't understand what that means, can you make a screenshot?

@darnn
Copy link

darnn commented May 15, 2023

I believe it means they have subtitles that are already timed, and they're selecting them all and running Whisper on them, which means it has to load the model and so on for every one, which is why it's slow.

@Purfview
Copy link
Contributor

I believe it means they have subtitles that are already timed, and they're selecting them all and running Whisper on them, which means it has to load the model and so on for every one, which is why it's slow.

I didn't knew of such feature by SE. In such case all whisper implementations will be slow because most time would be consumed by the model loading on every line, plus "r117" is packed into one file, so, + few seconds of unpacking. In such case maybe it would make sense to use non-packed "b103" release.

I see people have problems with Nvidia libs with "r117", I could make a release including them, but I don't know which libs are actually needed, don't wanna include whole 4GB stuff.
Could someone check which libs are needed by copying dlls [to same folder with whisper.exe] one by one on error from "b103" release, there libs are located at "Whisper-Faster\torch\lib"?

@despairTK
Copy link

despairTK commented May 16, 2023

我已经通过谷歌解决了这些错误

但是你没有写你是如何解决它们的。

这可能是由于我选择每行字幕进行转录。

我不明白这是什么意思,你能截图吗?

I'm sorry that I didn't write out the process of my solution. I will describe it now. I think it may be useful for most people, especially those who can run OpenAI\Whisper normally, but cannot run whisper-standalone-win normally.

Generally, people who can run OpenAI\Whisper have installed CUDA Toolkit 11.7 (I installed this version myself, it may be different from yours), but running whisper-standalone-win requires cuDNN Archive. I installed Download cuDNN v8.9.0 (April 11th, 2023), for CUDA 11.x (the downloaded version needs to correspond to the CUDA Toolkit version you installed, pay attention to the version number at the end of the two), download Download cuDNN After v8.9.0 (April 11th, 2023), for CUDA 11.x, unzip it.

Copy all the files inside to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7 (please follow your own installation path) and copy them in.

At this point run whisper-standalone-win if prompted
Could not load library cudnn_ops_infer64_8.dll.Error code 126Please make sure cudnn_ops_infer64_8.dll is in your library path!
Or .dll files with other names, you need to open the C.\Program FilesI\NVIDIA GPU Computing ToolkitCUDAv\11.2\bin directory, and copy the cudnn_ops_infer64_8.dll in this directory (copy according to the file name in the error prompt). After copying, open the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDAl\v11.2\libi\x64 directory and paste it.

After completing the above operations, it is still a reminder
Could not load library cudnn_ops_infer64_8.dll.Error code 126Please make sure cudnn_ops_infer64_8.dll is in your library path!
If this kind of error occurs, go to download the ZLIB DLL, after downloading and decompressing, add this path to PATH, open a new cmd, or restart - it will be OK.

If I'm not clear enough, please refer to these two links:
https://blog.csdn.net/qq_43503670/article/details/119744550
https://blog.csdn.net/HaoZiHuang/article/details/123196601

b103 I tried it, and the transcription speed is about the same. Just waiting for an update.

I remember that guillaumekln mentioned this situation under Softcatala/whisper-ctranslate2#11: "You could load the model once and then use the same model instance to transcribe each file. This should work around the issue and also be more efficient than reloading the model each time.”

In fact, https://github.com/Const-me/Whisper will be very fast for my line-by-line transcription, which is much faster than OpenAI\Whisper, but it has the disadvantage that some single-line subtitles that are too short cannot Recognition, and sometimes the recognition of a sentence is not very complete.

@rsmith02ct
Copy link

GPU is broken in ctranslate2 for most users and nothing is output with the standalone or through Subtitle Edit.

It seems to want CUDA 11.x and 12.x and CuDNN installed but even then I can't get it to function. We're waiting to fixes to ctranslate 2's CLI.

There is a patched version that is Faster Whisper with functional CUDA in Subtitle Edit and standalone here: Purfview/whisper-standalone-win#11
If you access it via ctranslate2 in SubtitleEdit you need to change the exe name to what it's looking for.

@darnn
Copy link

darnn commented May 16, 2023

Wait, has anyone managed to run WhisperX 2/3 on Windows? As far as I can tell it requires a recent version of Jax, which can't be installed on Windows at the moment (and when I try to run WhisperX using WSL I get some error, "'type' object is not subscriptable", which is beyond the scope of what we're talking about here, and which I don't understand enough to even try to solve).

@Purfview
Copy link
Contributor

Make a screenshot of that folder.

@suiram96
Copy link

image

image

image

@Purfview
Copy link
Contributor

Purfview commented May 23, 2023

Maybe it's because of non-latin character in the path. Try to move Whisper-Faster folder to disc's root. [D:\Whisper-Faster]

PS:
ffmpeg is not needed by faster-whisper.

@suiram96
Copy link

Thank you very much!
It does work now. I can't believe it was something as silly as that. I always try to avoid using any non-latin charecters for folder or file names...
It took it about 5-6 minutes to process and it didn't use the GPU at all.
Is it a standard time or should it be faster?
I have:
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
32GB RAM
NVIDIA GeForce GTX 1650 GDDR5 @ 4GB (128 bits)
Could it be that the GPU isn't compatible?

Also, the subtitles don't show the non-latin characters like á é í ó ú.
image
Is it an encoding issue? I have selected UTF-8 without BOM.

@suiram96
Copy link

PS: ffmpeg is not needed by faster-whisper.

Okay, I'll delete from the folder. I copied it there trying to solve the problem when I was running out of options 😅

@Purfview
Copy link
Contributor

Look at error_log.txt, there you'll see what is used CPU or CUDA.

Dunno about those letters, cut and share a short sample of that audio.

@suiram96
Copy link

I ran it again and it took about 4:20 for a 13:16 video, which I'm quite happy about.
Also, it seems like it is running on CUDA, but then it gets a RuntimeError for running out of memory. I was monitoring the system and it was using about 3.5GB of the GPU VRAM but about 2% use.
image

The subtitles still don't display those characters. I'll try reinstalling SE to see if that solves the issue. It didn't happen before.

Here is a 10 second sample of the audio:
NB 2023 05 16 sample.zip

@Purfview
Copy link
Contributor

But from your screenshot it's not running at all.
Try to use it from the console without SE, now it's not clear what you are doing there.

@suiram96
Copy link

I ran it through command prompt and specified to use CUDA. It worked well and the subtitles have all the characters. It took about the same time 4:14 and monitoring the system it was using about 3GB of VRAM and 0.2% of the GPU processing power.
I have tried it in SE again, after reinstalling it, but it still doesn't display the characters with accents, weird...

@Purfview
Copy link
Contributor

Purfview commented May 23, 2023

For me it looks OK on your sample:
alt text

I have tried it in SE again, after reinstalling it, but it still doesn't display the characters with accents, weird...

Are you sure it runs in SE?
Delete "error_log.txt" and all subtitles in Faster-Whisper folder. Then run in SE your sample you sent me.
Then share content of "error_log.txt".

@suiram96
Copy link

I have deleted all the models and downloaded them again, but it still processes the subtitles like this:
image

I have been using faster whisper through the command line and it's been working just fine. It is a bit tedious though...
The issue happen when it "gives" the subtitles to SE and SE doesn't "understand" those characters. Either that or whisper sends the text with a different encoding.
It's weird because it didn't happen before with the other engines and because I can open the subtitles generated separately without problem. They are generated correctly by whisper and imported correctly by SE.
Is there any way to configure whisper and set the encoding when using it in SE?

Here is the new "error_log.txt" file:
error_log.txt

I'll using the CPP engine to see if it also happens with it.

@suiram96
Copy link

With CPP works fine:
image

@Purfview
Copy link
Contributor

Purfview commented May 23, 2023

You copied it to the Windows' folders, you don't wanna do that with portable programs, especially when you don't run your system as Administrator. Move it out of there!

Btw, do you run SE Beta?

@suiram96
Copy link

I'm running SE version 3.6.13 which is the latest it seems. I didn't see any beta version.
I've moved whisper in another folder, but why would there be an issue with portable programs put in windows' files?
Normally I would have them somewhere else, since they're portable, but in this case I just wanted to place it in that folder...
I am running windows as Administrator, by the way.

@Purfview
Copy link
Contributor

Purfview commented May 23, 2023

Or real Administrator aka Windows is running you. :P
Because Windows don't like non-installed programs doing stuff in its folders.

Can you do few tests in console with large-v2? Check GPU VRAM usages with these 3 different options:
--compute_type=int8, --compute_type=float16, --compute_type=float32.
Close other programs like internet browser before doing tests.

@suiram96
Copy link

I was just doing that to see what happens 😁

  • Using it with the default int8 work fine and uses about 2.5-2.7 GB of VRAM
  • With int8_float16 starts working and uses almost all of the VRAM but it just outputs consecutive numbers, one number per line, like 8, 9 ,10, 11, 12 ... with a weird timecode starting at the 3 minute mark or so...
  • With int16 starts, the VRAM usage ramps up to 3.9GB and then it says that CUDA stopped working because of lack of RAM
  • With float16 the same.
  • With float32 it says my system is incompatible.

So the only one working is the int8.

Another thing, whenever I start the process, this message appears, but it still works. It appears no matter what int option I use:
2023-05-24 00:06:27.0846549 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.

PS:
I've just tried int8_float16 again and does nothing. It's using between 2.3GB and 3.3GB of VRAM but doesn't output anything now after about 5 minutes...

@suiram96
Copy link

suiram96 commented May 23, 2023

Or real Administrator aka Windows is running you. :P

You never really know 👀

@suiram96
Copy link

I was just doing that to see what happens 😁

  • Using it with the default int8 work fine and uses about 2.5-2.7 GB of VRAM
  • With int8_float16 starts working and uses almost all of the VRAM but it just outputs consecutive numbers, one number per line, like 8, 9 ,10, 11, 12 ... with a weird timecode starting at the 3 minute mark or so...
  • With int16 starts, the VRAM usage ramps up to 3.9GB and then it says that CUDA stopped working because of lack of RAM
  • With float16 the same.
  • With float32 it says my system is incompatible.

So the only one working is the int8.

Another thing, whenever I start the process, this message appears, but it still works. It appears no matter what int option I use: 2023-05-24 00:06:27.0846549 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.

PS: I've just tried int8_float16 again and does nothing. It's using between 2.3GB and 3.3GB of VRAM but doesn't output anything now after about 5 minutes...

By the way, the GPU use is about 2% total and the program itself uses about 0.5%

@suiram96
Copy link

Here are a couple snapshots:

01

02

@Purfview
Copy link
Contributor

Purfview commented May 23, 2023

Try --compute_type=auto, I'm interested what type it will auto select.

Another thing, whenever I start the process, this message appears, but it still works. It appears no matter what int option I use: 2023-05-24 00:06:27.0846549 [W:onnxruntime

Just ignore it.

@Cyberyoda1411
Copy link
Author

Do you know that you exchanged 51 MESSAGES for this topic? When you start your CHAT because it is chat, not the problem solution, you make it to be the CHAT, I get mail notifications every 5 minutes because you will DIE if you can't solve this problem. Why don't you chat in some other place? Go to Skype then do it forever.
Find one Audio to text solution that works for you, they are almost the same. There is no solution that is better 100%. If CPP method works for you, that is OK. You don't install python, if you can't, for unknown reasons.
I had that problem and I solved this in 5 messages with Nikolaj, and you can't solve it in 51???!! How many messages you will exchange? 3000?

@suiram96
Copy link

It jumps up and down between about 2.3GB and 3.3GB like with init8_float16:

03

Also, it seems like it is "hallucinating" subtitles:

06

@suiram96
Copy link

06

That means "Subtitles made by the Amara.org community" 🤷‍♂️

@suiram96
Copy link

Do you know that you exchanged 51 MESSAGES for this topic? When you start your CHAT because it is chat, not the problem solution, you make it to be the CHAT, I get mail notifications every 5 minutes because you will DIE if you can't solve this problem. Why don't you chat in some other place? Go to Skype then do it forever. Find one Audio to text solution that works for you, they are almost the same. There is no solution that is better 100%. If CPP method works for you, that is OK. You don't install python, if you can't, for unknown reasons. I had that problem and I solved this in 5 messages with Nikolaj, and you can't solve it in 51???!! How many messages you will exchange? 3000?

What the heck, man?
I won't "die" if I don't solve this problem, but I thought this is what this was for, solving problems.
And it seems like the issue now is not installing python, but something with the character encoding or whatever.
Then Purfview, who posted the standalone version of faster whisper, asked me questions relative to the problem. It's not like we were talking about the weather or whatever...
This might happen to somebody else too and if I solve the issue, they might to.
Also, I think you can unsubscribe to stop receiving notifications if it bothers you, right?

@Purfview
Copy link
Contributor

@suiram96 But few post ago you wrote that init8_float16 doesn't produce subtitles for you.
This topic is actually not related to your issues, make a topic in my repo.

I get mail notifications every 5 minutes

@Cyberyoda1411 At the email's bottom there is link to unsubscribe from topic.

@Cyberyoda1411
Copy link
Author

If you want me to LEAVE the Subtitle Edit forum because of you, I won't. You can't stop me from giving new suggestions about how this program can be better or how to solve some problems I have.

But if there is a way I to LEAVE THIS endless issue, tell me how to do it. I will give up on this issue gladly. Maybe I can click on CLOSE WITH THIS COMMENT??? But I am almost sure that when you write the next comment, it will be open again.

@suiram96
Copy link

Sorry for the inconvenience, brother. I didn't want you to leave or whatever. I didn't think it would bother anybody.
Peace.

@Purfview
Copy link
Contributor

But if there is a way I to LEAVE THIS endless issue, tell me how to do it.

Read my post above your post.
Another way: there is unsubscribe button at the top-right in this thread.

@niksedk niksedk mentioned this issue Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants