-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion on the TTS vocoder #1863
Comments
It is important because I see problems with low-pitched voices(like they grow "older"), If you have used single speaker version, It can be improved by using the multiple speaker vocoder. |
It is |
So I will test, compare and report the advantage of VCTK_V2 |
|
I propose that you just replace generator_v1, generator_v2 and generator_v3 in you repo with the new ones, as it is not mensioned that vctk is not used... @csukuangfj |
Hi
This is a question and a possible improvement proposal.
I was trying to train a hifigan vocoder for different frequencies(24KHz, 16Khz) but all my tries have failed with noisy voice.
After seeing this I noticed that the original hifi repo were using two v100 gpus for two weeks to get a good model and I understood why I constantly failed! So I tried to used the existing 22050Hz versions...
After experimenting with different versions of hifigan vocoder(v1, v2, and v3) I also noticed that v2 is much faster than v1 and faster than v3, without noticable difference in quality. So I reached to the same decision as your defaults in your repo.
So there are two versions of hifigan v2 vocoders in the original repository here and here, I wondered which of them are you using as it is not mentioned anywhere?
Note that LJSpeech is single speaker english, VCTK is multiple speaker english, and universal dataset is a combination of LibriSpeech(single speaker english), VCTK, and LJSpeech
Note: I previously thought that universal dataset is multilingual which is not true
The text was updated successfully, but these errors were encountered: