You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For train, pitch is from pyworld with uv;
For infer, pitch is from torchcrepe without uv,the uv of torchcrepe is not so accuracy to make some flaw;
Mel has the info of uv, so pitch without uv can be used for infer.
https://github.com/PlayVoice/NSF-BigVGAN/blob/d7204d8329e67e597f8856f3d0db596123a01d15/model/nsf.py#L305
The pitches extracted from the audio are all greater than zero, so the SineGen not use unvoiced information. Why do this, the unvoiced information is not important?
The text was updated successfully, but these errors were encountered: