Description
🚀 Feature
In order to fix bugs we are sometimes forced to introduce BC breaking changes. While the process of such introductions is clear when it comes to code changes, it's not when it comes to model weights or hyper-parameters. Thus we should define when, why and how to introduce BC-breaking changes when it comes to model weights or model hyper-parameters.
Motivation
We have recently bumped to a few issues that motivate this. Here are a few examples:
- On Feature Pyramid Network code bug #2326 we discovered a bug in the initialization of some weights of all detection models. If we fix the bug on code, we should probably retrain the models. What happens if their accuracy improves? How do we make them available to our users?
- How do we handle cases such as Change default value of eps in FrozenBatchNorm to match BatchNorm #2599 where in order to fix a bug we need to update the hyper-parameters of the model?
Approaches
There are quite a few different approaches for this:
- Replace the old parameters and Inform the community about the BC breaking changes. Example: [DONOTMERGE] Update the accuracy metrics of detection models #2942
- Reasonable approach when the accuracy improvement is substantial or the effect on the model behaviour is negligible.
- Keeps the code-base clean from workarounds and minimizes the number of weights we provide.
- Can potentially cause issues to users who use transfer learning.
- Write code/workarounds to minimize the effect of the changes on existing models. Example: Overwriting FrozenBN eps=0.0 if pretrained=True for detection models. #2940
- Reasonable approach when the changes lead to slight decrease in accuracy.
- Minimizes the effects on users who used pre-trained models.
- Introduces ugly workarounds on the code and increases the number of weights we provide.
- Introduce versioning on model weights:
- Appropriate when introducing significant changes on the models.
- Keeps the code-base clean from workarounds.
- Forces us to maintain multiple versions of weights and model config.
It's worth discussing whether we want to adapt our approach depending on the characteristics of the problem or if we want to go with one approach for all cases. Moreover it's worth investigating whether we need to handle differently changes on weights vs changes on hyper-parameters used on inference.