You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
on SG2 and SG3 given you use them on a modified fork you can resume training on a completely different batch size and still keep your Tick / Nkimg progress , by specifying it with the kwarg ""--nkimg"" example --nkimg=2500 resumes training with an assumed 2500kimg progress.
SGXL resets kimg to 0 if you change the batch size
I found its extremely useful to start with a very low batch size such as --batch=2 --glr=0.0008 --dlr=0.0006 to improve diversity and then swittch to a batch size of 32 / 64 / 128 for better FID when im starting to bottom out FID with batch=2 ,
however because the Augmentation , the G epoch and kimg resets in SGXL when doing this , I am having a really bad time.
The text was updated successfully, but these errors were encountered:
this method of training can make recall be 0.7+ instead of 0.5 on many datasets while still reaching the same FID (after also bottoming out FID on 128 batch size) so its tremendously better recall with this method. with batch=2 it also converges much faster so the first part of training focusing on diversity is very fast.
this seems to favor SG2-Ada way more than SGXL , SGXL can easily have collapses with low batch sizes , so its hard to tame , but still , a batch size of 2-16 for the first 144kimg and then switching to a batch size of 64 or 128 , seems beneficial on Unimodal datasets for better recall and faster training.
on SG2 and SG3 given you use them on a modified fork you can resume training on a completely different batch size and still keep your Tick / Nkimg progress , by specifying it with the kwarg ""--nkimg"" example --nkimg=2500 resumes training with an assumed 2500kimg progress.
SGXL resets kimg to 0 if you change the batch size
I found its extremely useful to start with a very low batch size such as --batch=2 --glr=0.0008 --dlr=0.0006 to improve diversity and then swittch to a batch size of 32 / 64 / 128 for better FID when im starting to bottom out FID with batch=2 ,
however because the Augmentation , the G epoch and kimg resets in SGXL when doing this , I am having a really bad time.
The text was updated successfully, but these errors were encountered: