Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
the effective batch size is always per_Device_batch_size * gradient accumulation
with my testing : with gradient accumulation 4 , num_epochs = 20
previously without fix it took 80 epochs as max_steps was 80 which overrides num_epochs. Takes evry long to train
after fix:
finishes training in 23 epochs , which is closer to 20 specified . Train time has reduced
From this blog https://lightning.ai/blog/gradient-accumulation/
batch size of 256 but can only fit a batch size of 64 into GPU memory, we can perform gradient accumulation over four batches of size 64. (After processing all four batches, we will have the accumulated gradients equivalent to a single batch of size 256.)
also here : https://discuss.huggingface.co/t/how-do-you-calculate-max-steps/40177 , same strategy is applied