You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cards: 4 , batch size: 10 , lr: 1e-4
sample audio : 1s - 20s each, 16K audios, hop_size=200
mel-extractor: same as HiFiGAN
text tokenizer: phoneme tokens
I have trained to 250,000 steps but still can not get clear mel-spectrogram output. the validation output is blurred and diluteed artifacts,
has anybody run into such a situation? what is the reason? is this problem on training data preprocessing? or need some training parameter adjustment? wating for suggestions. thanks.
The text was updated successfully, but these errors were encountered:
cards: 4 , batch size: 10 , lr: 1e-4
sample audio : 1s - 20s each, 16K audios, hop_size=200
mel-extractor: same as HiFiGAN
text tokenizer: phoneme tokens
I have trained to 250,000 steps but still can not get clear mel-spectrogram output. the validation output is blurred and diluteed artifacts,
has anybody run into such a situation? what is the reason? is this problem on training data preprocessing? or need some training parameter adjustment? wating for suggestions. thanks.
The text was updated successfully, but these errors were encountered: