-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetune XTTS for new languages #3992
Comments
Hello, man. I'm very pleased with your contribution. Can you provide your trained models? I want to check if they are working well. |
Due to copyright issues, I am currently unable to share the model's weights with you. I apologize for the inconvenience. |
How long did it take you to train 100 hours of audio, and can you tell me your current computer configuration? |
it took over 8 hours to train 100 hours of audio on single A100 40Gb
|
Understandable. However, will you be able to share a snippet audio of what the model has produced? |
Please find the relevant file at the following Google Drive link: |
hi man Is it possible to train the xttsv2 model for about 10 hours and can it work well only based on these 10 hours? Actually, I trained the model with your code and reached a loss of 0.5 and used the model and the output was very bad and nothing was audible. I used google/fleurs dataset for Farsi language. Thank you very much |
First, I recommend you do not train DVAE (because you have a small amount of data). And I think 10 hours is not enough; it makes the model overfit with your data. The losses I got are about 0.8. |
thanks for your good job and reply but after inference |
How many epochs and steps are required for training on 100 hours of data? And it took a few hours my friend |
Hi, nice work! I'm not involved with it, just an idea. |
2 epochs work well for me |
and If lose decreases and becomes less than 1, but it still reads the text incorrectly, what is your opinion about this? What do you advise me to do to solve this problem, maybe my important problem is solved |
I don't want to train the model on the whole languageI want to teach on limited sentences of a new language |
I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first. |
Thank you very much, so your opinion is that my problem is the small amount of data and I cannot get good results from this model that I have trained on few sentences and it must be trained on a large amount of data. Thank you for paying the zakat of your knowledge :) |
In short, teaching a language with 10 letters and about 100 sentences is not possible? So that the model reads these 100 trained sentences correctly? |
Hey, great work! I am having a question: I want to train this model on Vietnamese, but with vi-north and vi-south as separate languages and have separate metadata csvs for them. Does the multidataset training option support this and shuffle both the vi-north and vi-south data together with separate languages beforehand? Thank you in advance! |
Yes, you can |
hi, i'm trying to add a new language, Telugu . collected almost 100 hours of speech data. trained Dvae and and gpt this was the metrics �[1m--> EVAL PERFORMANCE�[0m |
Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.
https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages
The text was updated successfully, but these errors were encountered: