-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training problem #11
Comments
Hi, thanks for your interest in our work. Which dataset are you working with? From the first training log, I can see that none of the training loss items are NaN, so their sum shouldn't be NaN either. This is quite unusual. |
Thanks for the reply, I'm using the Vocaset dataset for the training |
I have the same problem. All parameters are the author's default Settings, but the final loss does not converge. |
2024-07-22 18:56:45,895 Epoch 8993: Train_vertice_recon 3.705e-07 Train_vertice_reconv 2.486e-08 Train_lip_recon 0.000e+00 Train_lip_reconv 0.000e+00 Val_vertice_recon 5.470e-07 Val_vertice_reconv 3.962e-08 Val_lip_recon 0.000e+00 Val_lip_reconv 0.000e+00 Memory 50.9% |
2024-07-21 18:10:19,878 Training started |
Same issue for me! I used |
The nan loss is because of None return from So maybe you should rewrite the loss in |
update还是需要更新的,这样才能将recon loss,recon velocity loss记录下来。但是所有loss应该通过重写allsplit_step汇总 (在allsplit函数中汇总并return,相当于update只是起个记录的作用)
| |
杨逸凡
|
|
***@***.***
|
---- Replied Message ----
| From | ***@***.***> |
| Date | 11/28/2024 17:25 |
| To | ***@***.***> |
| Cc | Yifan ***@***.***> ,
***@***.***> |
| Subject | Re: [theEricMa/DiffSpeaker] Training problem (Issue #11) |
The nan loss is because of None return from update() function in DIFFUSION_BIAS. Actually, when you overrive the update() of Metric, no return value is expected .
So maybe you should rewrite the loss in allsplit_step, please follow this link , it works for me. #5 (comment)
你好,只用重写allsplit_step吗?update需要更新吗
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
|
Hello. Thank you for your outstanding work. However, I am having some problems reproducing the training portion of the code and am not getting the expected training results. Your code originally appeared to have all losses as nan, as shown below.
I tried to modify the loss function a bit, but it seems that there is no backpropagation, although the losses are no longer nan.
where all the parameters use the default training parameters,Except that batch_size was changed from 36 to 24
The text was updated successfully, but these errors were encountered: