-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/refactor collector #1063
Feat/refactor collector #1063
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first review, I haven't looked at the implementation of collect in detail here yet.
I have the feeling more reset methods can disappear from the collector if we just reset the env during .collect
(which can be done optionally, I guess)
Failing pipeline needs attention before the next review |
Still an off-by-one error somewhere
Previously the call to iter would reset the buffer, but some tests and scripts rely on collecting transitions prior to looping over the trainer
1. Call reset on collector before collecting 2. Collector logic has changed if one first collects some steps and then collects n_episodes. In the prior implementations, it would not reset the env, and thus not collect the desired number of full episodes, instead starting wherever it already was
…FER. IMPORTANT - this needs fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment left
@Trinkle23897 this is ready for review now, @bordeauxred and I worked on this together, so it's reviewed from my side. There are only very minor breaking changes, see PR description Overall, I think it's a good step towards a more readable and transparent Collector. We have encountered many issues while working on that, so there are many new TODOs. We'll address them later on, and I'll add a new notebook documenting "gotchas" in Batch and Buffer in a separate PR. I think a quick review from your side should be enough, understanding all details of the changes would be time consuming. Long story short: collector implementations rely less on state, are better typed and are more readable now. Hacky batch things have been marked as such explicitly |
I'll merge this now since it should be uncontroversial and it is blocking further development like #1077 . @Trinkle23897 if you have any comments, we can address them post-merge |
Thanks for doing that! |
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <[email protected]>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <[email protected]>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <[email protected]>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <[email protected]>
Closes: #1058
Description (copied from newly established Changelog):
Api Extensions
to_dict
andto_list_of_dicts
. Feat/refactor collector #1063Collector
s can now be closed, and their reset is more granular. Feat/refactor collector #1063CollectStats
calledwith_autogenerated_stats
. Feat/refactor collector #1063Internal Improvements
Collector
s rely less on state, the few stateful things are stored explicitly instead of through a.data
attribute. Feat/refactor collector #1063Collector
s. Feat/refactor collector #1063exploration_noise
and within Collector. Feat/refactor collector #1063Breaking Changes
.data
attribute fromCollector
and its child classes. Feat/refactor collector #1063reset
expicitly or pass
reset_before_collect=True
. Feat/refactor collector #1063iter(Batch(...)
which now behaves the same way asBatch(...).__iter__()
. Can be considered a bugfix. Feat/refactor collector #1063