-
Notifications
You must be signed in to change notification settings - Fork 10
Optionally Reduce LR on plateau #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
johnlockejrr
wants to merge
64
commits into
qurator-spk:master
Choose a base branch
from
johnlockejrr:ReduceLROnPlateau
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
avoid ensembling if no model weights met the threshold f1 score in the case of classification
…n an output directory with the same file name
…d as an argument for inference
…d as a class for layout segmentation
…in label images after artificial label
…no patch augmentation
…herwise can give an cv2 error
Changed unsafe basename extraction: `file_name = i.split('.')[0]` to `file_name = os.path.splitext(i)[0]` and `filename = n[i].split('.')[0]` to `filename = os.path.splitext(n[i])[0]` because `"Vat.sam.2_206.jpg` -> `Vat` instead of `"Vat.sam.2_206`
Keep safely the full basename without extension
# Learning Rate Warmup and Optimization Implementation ## Overview Added learning rate warmup functionality to improve training stability, especially when using pretrained weights. The implementation uses TensorFlow's native learning rate scheduling for better performance. ## Changes Made ### 1. Configuration Updates (`runs/train_no_patches_448x448.json`) Added new configuration parameters for warmup: ```json { "warmup_enabled": true, "warmup_epochs": 5, "warmup_start_lr": 1e-6 } ``` ### 2. Training Script Updates (`train.py`) #### A. Optimizer and Learning Rate Schedule - Replaced fixed learning rate with dynamic scheduling - Implemented warmup using `tf.keras.optimizers.schedules.PolynomialDecay` - Maintained compatibility with existing ReduceLROnPlateau and EarlyStopping ```python if warmup_enabled: lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay( initial_learning_rate=warmup_start_lr, decay_steps=warmup_epochs * steps_per_epoch, end_learning_rate=learning_rate, power=1.0 # Linear decay ) optimizer = Adam(learning_rate=lr_schedule) else: optimizer = Adam(learning_rate=learning_rate) ``` #### B. Learning Rate Behavior - Initial learning rate: 1e-6 (configurable via `warmup_start_lr`) - Target learning rate: 5e-5 (configurable via `learning_rate`) - Linear increase over 5 epochs (configurable via `warmup_epochs`) - After warmup, learning rate remains at target value until ReduceLROnPlateau triggers ## Benefits 1. Improved training stability during initial epochs 2. Better handling of pretrained weights 3. Efficient implementation using TensorFlow's native scheduling 4. Configurable through JSON configuration file 5. Maintains compatibility with existing callbacks (ReduceLROnPlateau, EarlyStopping) ## Usage To enable warmup: 1. Set `warmup_enabled: true` in the configuration file 2. Adjust `warmup_epochs` and `warmup_start_lr` as needed 3. The warmup will automatically integrate with existing learning rate reduction and early stopping To disable warmup: - Set `warmup_enabled: false` or remove the warmup parameters from the configuration file
# Training Script Improvements ## Learning Rate Management Fixes ### 1. ReduceLROnPlateau Implementation - Fixed the learning rate reduction mechanism by replacing the manual epoch loop with a single `model.fit()` call - This ensures proper tracking of validation metrics across epochs - Configured with: ```python reduce_lr = ReduceLROnPlateau( monitor='val_loss', factor=0.2, # More aggressive reduction patience=3, # Quick response to plateaus min_lr=1e-6, # Minimum learning rate min_delta=1e-5, # Minimum change to be considered improvement verbose=1 ) ``` ### 2. Warmup Implementation - Added learning rate warmup using TensorFlow's native scheduling - Gradually increases learning rate from 1e-6 to target (2e-5) over 5 epochs - Helps stabilize initial training phase - Implemented using `PolynomialDecay` schedule: ```python lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay( initial_learning_rate=warmup_start_lr, decay_steps=warmup_epochs * steps_per_epoch, end_learning_rate=learning_rate, power=1.0 # Linear decay ) ``` ### 3. Early Stopping - Added early stopping to prevent overfitting - Configured with: ```python early_stopping = EarlyStopping( monitor='val_loss', patience=6, restore_best_weights=True, verbose=1 ) ``` ## Model Saving Improvements ### 1. Epoch-based Model Saving - Implemented custom `ModelCheckpointWithConfig` to save both model and config - Saves after each epoch with corresponding config.json - Maintains compatibility with original script's saving behavior ### 2. Best Model Saving - Saves the best model at training end - If early stopping triggers: saves the best model from training - If no early stopping: saves the final model ## Configuration All parameters are configurable through the JSON config file: ```json { "reduce_lr_enabled": true, "reduce_lr_monitor": "val_loss", "reduce_lr_factor": 0.2, "reduce_lr_patience": 3, "reduce_lr_min_lr": 1e-6, "reduce_lr_min_delta": 1e-5, "early_stopping_enabled": true, "early_stopping_monitor": "val_loss", "early_stopping_patience": 6, "early_stopping_restore_best_weights": true, "warmup_enabled": true, "warmup_epochs": 5, "warmup_start_lr": 1e-6 } ``` ## Benefits 1. More stable training with proper learning rate management 2. Better handling of training plateaus 3. Automatic saving of best model 4. Maintained compatibility with existing config saving 5. Improved training monitoring and control
With
We'll see at training's end. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Learning Rate Warmup and Optimization Implementation
Overview
Added learning rate warmup functionality to improve training stability, especially when using pretrained weights. The implementation uses TensorFlow's native learning rate scheduling for better performance.
Changes Made
1. Configuration Updates (
runs/train_no_patches_448x448.json
)Added new configuration parameters for warmup:
2. Training Script Updates (
train.py
)A. Optimizer and Learning Rate Schedule
tf.keras.optimizers.schedules.PolynomialDecay
B. Learning Rate Behavior
warmup_start_lr
)learning_rate
)warmup_epochs
)Benefits
Usage
To enable warmup:
warmup_enabled: true
in the configuration filewarmup_epochs
andwarmup_start_lr
as neededTo disable warmup:
warmup_enabled: false
or remove the warmup parameters from the configuration file