-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use HuBERT features to train SyncNet, the loss does not converge. #150
Comments
the loss should be BCE instead of MSE loss. also can you provide the code? |
Thanks for your reply, I used BCE loss, but the result is same. I only changed syncnet.py , other files is same with your repo, and my hubert features from meta's hubert offical repo, my syncnet as follow:
ResBlock1d and DownBlock1d refer to DInet:https://github.com/MRzzm/DINet/blob/3b57fb0a2482213327890fbb76baeafdaa412597/models/Syncnet.py#L3 and https://github.com/MRzzm/DINet/blob/3b57fb0a2482213327890fbb76baeafdaa412597/models/Syncnet.py#L55 |
I have replaced the mel spectrogram with HuBERT features to train wav2lip, and it runs through, but when training SyncNet, the loss keeps hovering around 0.69 and won't go down. It can be reduced with mel spectrograms. I would like to ask for help to see what the problem might be.
1: The face encoding dimension of wav2lip is (8, 1024, 1, 1), where 8 represents the batch size. However, the feature dimension of HuBERT that I use is (8, 1024, 10). The input dimension of mel is (8, 1, 80, 16), and after convolution, it becomes (8, 1024, 1, 1), which can be trained normally. Therefore, I first use permute to perform dimension conversion, and then use Conv1D convolution to reduce the last dimension, ultimately obtaining (8, 1024, 1, 1). The code is as follows:
2:audio_encoder code:
And I also modified the network to make it deeper, but it still didn't work. the new network as follows:
I also change BCEloss to MSEloss, but loss does not converge! can you help me , thanks!
The text was updated successfully, but these errors were encountered: