[RFC] How to handle BC breaking changes on Model weights or hyper-parameters

## 🚀 Feature
In order to fix bugs we are sometimes forced to introduce BC breaking changes. While the process of such introductions is clear when it comes to code changes, it's not when it comes to model weights or hyper-parameters. Thus we should define when, why and how to introduce BC-breaking changes when it comes to model weights or model hyper-parameters.

## Motivation

We have recently bumped to a few issues that motivate this. Here are a few examples:
- On #2326 we discovered a bug in the initialization of some weights of all detection models. If we fix the bug on code, we should probably retrain the models. What happens if their accuracy improves? How do we make them available to our users? 
- How do we handle cases such as #2599 where in order to fix a bug we need to update the hyper-parameters of the model?

## Approaches

There are quite a few different approaches for this:
1. Replace the old parameters and Inform the community about the BC breaking changes. Example: #2942
   - Reasonable approach when the accuracy improvement is substantial or the effect on the model behaviour is negligible.
   - Keeps the code-base clean from workarounds and minimizes the number of weights we provide.
   - Can potentially cause issues to users who use transfer learning.
2. Write code/workarounds to minimize the effect of the changes on existing models. Example: #2940
   - Reasonable approach when the changes lead to slight decrease in accuracy.
   - Minimizes the effects on users who used pre-trained models.
   - Introduces ugly workarounds on the code and increases the number of weights we provide.
3. Introduce versioning on model weights:
   - Appropriate when introducing significant changes on the models.
   - Keeps the code-base clean from workarounds.
   - Forces us to maintain multiple versions of weights and model config.

It's worth discussing whether we want to adapt our approach depending on the characteristics of the problem or if we want to go with one approach for all cases. Moreover it's worth investigating whether we need to handle differently changes on weights vs changes on hyper-parameters used on inference.

cc @fmassa @cpuhrsch @vfdev-5 @mthrok 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] How to handle BC breaking changes on Model weights or hyper-parameters #2955

🚀 Feature

Motivation

Approaches

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] How to handle BC breaking changes on Model weights or hyper-parameters #2955

Description

🚀 Feature

Motivation

Approaches

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions