Replies: 4 comments 5 replies
-
Isn't |
Beta Was this translation helpful? Give feedback.
-
In the second trial, increasing regularization, i.e., increasing weight decay and applying early stopping (especially this one) seems helped IMHO. |
Beta Was this translation helpful? Give feedback.
-
Yes, but excessive decay or early stop is not good approach. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I am training a CLIP model, but the eval_loss is diverging after 25K steps. Here are the few scenarios I have tried.

This was trained with following config:
CUDA_VISIBLE_DEVICES="0, 1, 2, 3" accelerate launch --mixed_precision="fp16" run_clip.py
--max_grad_norm 0.9
--num_train_epochs 1500
--output_dir ~/clip-sa-base4
--model_name_or_path /home/user/clip-sa2
--tokenizer_name /home/user/clip-sanskrit-data/checkpoint-37920000
--train_file /home/user/clip-sanskrit-data/annotations/train2017.json
--validation_file /home/user/data/annotations/captions_val2017.json
--image_column image_path
--caption_column captions
--remove_unused_columns=False
--torch_compile=True
--load_best_model_at_end=True
--evaluation_strategy "steps"
--save_strategy "steps"
--dataloader_drop_last True
--save_total_limit 10
--no_cuda False
--do_train --do_eval
--learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1
--overwrite_output_dir
I added a 0.9 of max_grad_norm for second trial and weight decay of 0.2

3rd Run: Change lr to 5e-7, Weight decay to 0.1, max_grad_norm 1)

Can somebody help me with how to reduce or eliminate this overfitting? TIA
Beta Was this translation helpful? Give feedback.
All reactions