fix_calc_max_steps #295

Ssukriti · 2023-12-11T19:53:11Z

the effective batch size is always per_Device_batch_size * gradient accumulation

with my testing : with gradient accumulation 4 , num_epochs = 20

previously without fix it took 80 epochs as max_steps was 80 which overrides num_epochs. Takes evry long to train

after fix:
finishes training in 23 epochs , which is closer to 20 specified . Train time has reduced

From this blog https://lightning.ai/blog/gradient-accumulation/
batch size of 256 but can only fit a batch size of 64 into GPU memory, we can perform gradient accumulation over four batches of size 64. (After processing all four batches, we will have the accumulated gradients equivalent to a single batch of size 256.)
also here : https://discuss.huggingface.co/t/how-do-you-calculate-max-steps/40177 , same strategy is applied

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Ssukriti · 2023-12-11T22:13:38Z

I am going to do a quality test with this once we figure out some other issues and have a benchmark for code in main . Currently paused this

calc_max_steps

f6bfb01

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Ssukriti requested review from alex-jw-brooks, gkumbhat, evaline-ju, gabe-l-hart and tharapalanivel as code owners December 11, 2023 19:53

Ssukriti changed the title ~~calc_max_steps~~ fix_calc_max_steps Dec 11, 2023

formatting

6779595

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Provide feedback