Discussion on the TTS vocoder #1863

mah92 · 2025-02-15T04:32:20Z

Hi
This is a question and a possible improvement proposal.
I was trying to train a hifigan vocoder for different frequencies(24KHz, 16Khz) but all my tries have failed with noisy voice.
After seeing this I noticed that the original hifi repo were using two v100 gpus for two weeks to get a good model and I understood why I constantly failed! So I tried to used the existing 22050Hz versions...
After experimenting with different versions of hifigan vocoder(v1, v2, and v3) I also noticed that v2 is much faster than v1 and faster than v3, without noticable difference in quality. So I reached to the same decision as your defaults in your repo.
So there are two versions of hifigan v2 vocoders in the original repository here and here, I wondered which of them are you using as it is not mentioned anywhere?

Folder Name	Generator	Dataset	Fine-Tuned
LJ_V1	V1	LJSpeech	No
LJ_V2	V2	LJSpeech	No
LJ_V3	V3	LJSpeech	No
LJ_FT_T2_V1	V1	LJSpeech	Yes (Tacotron2)
LJ_FT_T2_V2	V2	LJSpeech	Yes (Tacotron2)
LJ_FT_T2_V3	V3	LJSpeech	Yes (Tacotron2)
VCTK_V1	V1	VCTK	No
VCTK_V2	V2	VCTK	No
VCTK_V3	V3	VCTK	No
UNIVERSAL_V1	V1	Universal	No

Note that LJSpeech is single speaker english, VCTK is multiple speaker english, and universal dataset is a combination of LibriSpeech(single speaker english), VCTK, and LJSpeech
Note: I previously thought that universal dataset is multilingual which is not true

mah92 · 2025-02-15T04:36:17Z

It is important because I see problems with low-pitched voices(like they grow "older"), If you have used single speaker version, It can be improved by using the multiple speaker vocoder.

csukuangfj · 2025-02-15T12:17:54Z

It is LJ_V{1,2,3}.

mah92 · 2025-02-15T15:04:13Z

So I will test, compare and report the advantage of VCTK_V2

mah92 · 2025-02-15T18:07:19Z

And the miracle happend. Musa have become 20 years younger! Thanks God...
I have replaced the vctk vocoder and suddenly the mans voice got clear...
For comparison examples, see: this and that
For the new model with sherpa metadata, see here

mah92 · 2025-02-15T18:32:08Z

~~The Khadijah(female) voice has not changed with the vctk vocoder.~~ Even the female voice reads some letters better.

mah92 · 2025-02-17T08:51:36Z

I propose that you just replace generator_v1, generator_v2 and generator_v3 in you repo with the new ones, as it is not mensioned that vctk is not used... @csukuangfj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion on the TTS vocoder #1863

Discussion on the TTS vocoder #1863

mah92 commented Feb 15, 2025 •

edited

Loading

mah92 commented Feb 15, 2025 •

edited

Loading

csukuangfj commented Feb 15, 2025

mah92 commented Feb 15, 2025

mah92 commented Feb 15, 2025

mah92 commented Feb 15, 2025 •

edited

Loading

mah92 commented Feb 17, 2025

Discussion on the TTS vocoder #1863

Discussion on the TTS vocoder #1863

Comments

mah92 commented Feb 15, 2025 • edited Loading

mah92 commented Feb 15, 2025 • edited Loading

csukuangfj commented Feb 15, 2025

mah92 commented Feb 15, 2025

mah92 commented Feb 15, 2025

mah92 commented Feb 15, 2025 • edited Loading

mah92 commented Feb 17, 2025

mah92 commented Feb 15, 2025 •

edited

Loading

mah92 commented Feb 15, 2025 •

edited

Loading

mah92 commented Feb 15, 2025 •

edited

Loading