Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/refactor collector #1063

Merged
merged 80 commits into from
Mar 28, 2024

Conversation

bordeauxred
Copy link
Contributor

@bordeauxred bordeauxred commented Feb 23, 2024

Closes: #1058

Description (copied from newly established Changelog):

Api Extensions

Internal Improvements

Breaking Changes

@bordeauxred
Copy link
Contributor Author

@MischaPanch

Copy link
Collaborator

@MischaPanch MischaPanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first review, I haven't looked at the implementation of collect in detail here yet.

I have the feeling more reset methods can disappear from the collector if we just reset the env during .collect (which can be done optionally, I guess)

tianshou/data/collector.py Outdated Show resolved Hide resolved
tianshou/data/collector.py Outdated Show resolved Hide resolved
tianshou/data/collector.py Outdated Show resolved Hide resolved
tianshou/data/collector.py Outdated Show resolved Hide resolved
@MischaPanch
Copy link
Collaborator

Failing pipeline needs attention before the next review

bordeauxred and others added 25 commits March 7, 2024 11:31
Still an off-by-one error somewhere
Previously the call to iter would reset the buffer, but some tests and scripts rely on collecting
transitions prior to looping over the trainer
1. Call reset on collector before collecting
2. Collector logic has changed if one first collects
some steps and then collects n_episodes. In the prior implementations, it would not reset the env, and thus not collect the desired number of full episodes, instead starting wherever it already was
@MischaPanch MischaPanch added the refactoring No change to functionality label Mar 26, 2024
Copy link
Collaborator

@MischaPanch MischaPanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment left

@MischaPanch MischaPanch marked this pull request as ready for review March 26, 2024 11:18
@MischaPanch
Copy link
Collaborator

MischaPanch commented Mar 26, 2024

@Trinkle23897 this is ready for review now, just the minor comment about replacing of overload by generics is left

@bordeauxred and I worked on this together, so it's reviewed from my side. There are only very minor breaking changes, see PR description

Overall, I think it's a good step towards a more readable and transparent Collector.

We have encountered many issues while working on that, so there are many new TODOs. We'll address them later on, and I'll add a new notebook documenting "gotchas" in Batch and Buffer in a separate PR.

I think a quick review from your side should be enough, understanding all details of the changes would be time consuming. Long story short: collector implementations rely less on state, are better typed and are more readable now. Hacky batch things have been marked as such explicitly

@MischaPanch
Copy link
Collaborator

I'll merge this now since it should be uncontroversial and it is blocking further development like #1077 . @Trinkle23897 if you have any comments, we can address them post-merge

@MischaPanch MischaPanch merged commit 4f65b13 into thu-ml:master Mar 28, 2024
4 checks passed
@Trinkle23897
Copy link
Collaborator

Thanks for doing that!

@bordeauxred bordeauxred deleted the feat/refactor_collector branch April 2, 2024 09:42
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this pull request Apr 15, 2024
Closes: thu-ml#1058 

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <[email protected]>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this pull request Apr 15, 2024
Closes: thu-ml#1058

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <[email protected]>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this pull request Apr 15, 2024
Closes: thu-ml#1058

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <[email protected]>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this pull request Apr 15, 2024
Closes: thu-ml#1058 

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring No change to functionality
Projects
Development

Successfully merging this pull request may close these issues.

Remove data from state in Collector, and remove preprocess_fn there
3 participants