Step By Step Tutorial? #85
-
I understand that this may be a rookie request, but it would be greatly appreciated if a user-friendly guide/tutorial could be made, I have been struggling for hours trying to understand how to use this project. I would just like to clone a Voice using a 4-hour audio file and use it as text-to-speech, locally on Windows, but I don't even know where to begin. The README is very vague and hard to understand, it doesn't even teach you how to install eSpeak-ng on Windows (which is supposed to be a requirement). |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 6 replies
-
I have a guide I can share. I hesitate though as it's a complex repo which in turn requires a complex guide. There are a lot of very nuanced things that can go wrong. While I do have a very successful and impressive result. I'm almost sure the steps I am taking could be improved and or corrected in some way. I'd be open for any corrections of course. Give me a bit and I'll post it though. |
Beta Was this translation helpful? Give feedback.
-
If anyone prefers to run the model locally, I made a hacky CLI to do that. https://github.com/persuck/StyleTTS2 |
Beta Was this translation helpful? Give feedback.
-
Try the hugging face demo or the local GUI. For the offline local web gui check out the GPL fork (GPL because of phonemizer): https://github.com/NeuralVox/StyleTTS2 |
Beta Was this translation helpful? Give feedback.
-
Ok this took longer than I thought. But I tested it, it works, I suggest you follow everything exactly. Also keep in mind, I have no degree, I taught myself to code, so this is kinda scary but here you go. I tried to automate as much as I could for you. Let me know how it goes! https://github.com/IIEleven11/StyleTTS2FineTune
|
Beta Was this translation helpful? Give feedback.
-
to train from scratch you can do this:
for some, the biggest issue will be gpu memory, for most projects it will take at least 16gp of memory but that will will not be enough for the second phase training which i've seen can take up to 80gb of memory. |
Beta Was this translation helpful? Give feedback.
Ok this took longer than I thought. But I tested it, it works, I suggest you follow everything exactly. Also keep in mind, I have no degree, I taught myself to code, so this is kinda scary but here you go. I tried to automate as much as I could for you. Let me know how it goes!
https://github.com/IIEleven11/StyleTTS2FineTune