Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support computing custom scores and terminating/saving based on them in BaseTrainer #1202

Merged
merged 4 commits into from
Aug 14, 2024

Conversation

anyongjin
Copy link
Contributor

@anyongjin anyongjin commented Aug 13, 2024

This PR introduces a new concept into tianshou training: a best_score. It is computed from the appropriate Stats instance and always added to InfoStats.

Breaking Changes:

  • InfoStats has a new non-optional field best_score

Background

Currently, tianshou uses the maximum average return to find the best model. But sometimes it may not meet user needs, for example, the average return only drops by 5%, but the standard deviation drops by 50%. The latter is generally considered to be more stable and better than the former.

Copy link
Collaborator

@MischaPanch MischaPanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @anyongjin, it's a good contribution!

Overall, the trainer has to become more flexible, but it would be too much to ask for right now. I think we can merge this after some slight changes and then soon refactor the trainer, taking in consideration the support for custom scoring and custom conditions on terminating the training

tianshou/trainer/base.py Outdated Show resolved Hide resolved
tianshou/trainer/base.py Outdated Show resolved Hide resolved
tianshou/trainer/base.py Outdated Show resolved Hide resolved
tianshou/trainer/base.py Outdated Show resolved Hide resolved
@anyongjin
Copy link
Contributor Author

In essence, Average Reward and Test Score are two different things. The former represents a fixed test result indicator. The latter is a score for the test result. The scoring logic for different tasks and users may be different. For example, some consider the standard deviation and some do not.
Currently, tianshou uses best_reawrd for both average reward and test score. It is difficult for users to implement custom scoring logic. So I suggest that best_reward be used only for average reward, and best_score be added for test score. In this way, best_reward and best_score are two different things. If it is called best_custom_score, people will think that there is a system default score field, so I think it is better not to add 'custom'.

Update:

  • Added explanation for InfoStats.best_score.
  • Use lambda function when compute_score_fn is None to avoid multiple if-else

@MischaPanch MischaPanch changed the title add evaluate_test_fn to BaseTrainer (Calculate the test batch performance score to determine whether it is the best model) Support computing custom scores and terminating/saving based on them in BaseTrainer Aug 14, 2024
@MischaPanch MischaPanch merged commit a38e586 into thu-ml:master Aug 14, 2024
4 checks passed
@anyongjin anyongjin mentioned this pull request Aug 14, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants