Feat/refactor collector #1063

bordeauxred · 2024-02-23T17:55:25Z

Closes: #1058

Description (copied from newly established Changelog):

Api Extensions

Batch received two new methods: to_dict and to_list_of_dicts. Feat/refactor collector #1063
Collectors can now be closed, and their reset is more granular. Feat/refactor collector #1063
Trainers can control whether collectors should be reset prior to training. Feat/refactor collector #1063
Convenience constructor for CollectStats called with_autogenerated_stats. Feat/refactor collector #1063

Internal Improvements

Collectors rely less on state, the few stateful things are stored explicitly instead of through a .data attribute. Feat/refactor collector #1063
Introduced a first iteration of a naming convention for vars in Collectors. Feat/refactor collector #1063
Generally improved readability of Collector code and associated tests (still quite some way to go). Feat/refactor collector #1063
Improved typing for exploration_noise and within Collector. Feat/refactor collector #1063

Breaking Changes

Removed .data attribute from Collector and its child classes. Feat/refactor collector #1063
Collectors no longer reset the environment on initialization. Instead, the user might have to call reset
expicitly or pass reset_before_collect=True . Feat/refactor collector #1063
VectorEnvs now return an array of info-dicts on reset instead of a list. Feat/refactor collector #1063
Fixed iter(Batch(...) which now behaves the same way as Batch(...).__iter__(). Can be considered a bugfix. Feat/refactor collector #1063

bordeauxred · 2024-02-23T17:55:36Z

@MischaPanch

MischaPanch

A first review, I haven't looked at the implementation of collect in detail here yet.

I have the feeling more reset methods can disappear from the collector if we just reset the env during .collect (which can be done optionally, I guess)

tianshou/data/collector.py

MischaPanch · 2024-02-23T21:24:30Z

Failing pipeline needs attention before the next review

…ions

Still an off-by-one error somewhere

Previously the call to iter would reset the buffer, but some tests and scripts rely on collecting transitions prior to looping over the trainer

1. Call reset on collector before collecting 2. Collector logic has changed if one first collects some steps and then collects n_episodes. In the prior implementations, it would not reset the env, and thus not collect the desired number of full episodes, instead starting wherever it already was

…or n_episodes

…FER. IMPORTANT - this needs fixing!

…_collect_obs

tianshou/policy/modelfree/dqn.py

MischaPanch

Just one minor comment left

MischaPanch · 2024-03-26T11:22:19Z

@Trinkle23897 this is ready for review now, ~~just the minor comment about replacing of overload by generics is left~~

@bordeauxred and I worked on this together, so it's reviewed from my side. There are only very minor breaking changes, see PR description

Overall, I think it's a good step towards a more readable and transparent Collector.

We have encountered many issues while working on that, so there are many new TODOs. We'll address them later on, and I'll add a new notebook documenting "gotchas" in Batch and Buffer in a separate PR.

I think a quick review from your side should be enough, understanding all details of the changes would be time consuming. Long story short: collector implementations rely less on state, are better typed and are more readable now. Hacky batch things have been marked as such explicitly

MischaPanch · 2024-03-28T17:01:50Z

I'll merge this now since it should be uncontroversial and it is blocking further development like #1077 . @Trinkle23897 if you have any comments, we can address them post-merge

Trinkle23897 · 2024-03-31T17:47:22Z

Thanks for doing that!

Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <[email protected]>

bordeauxred added 2 commits February 23, 2024 14:22

remove self.data, break async tests

a1e3908

Remove preprocess_fn, adapt tests.

72004a1

MischaPanch requested changes Feb 23, 2024

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

tianshou/data/collector.py Outdated Show resolved Hide resolved

tianshou/data/collector.py Outdated Show resolved Hide resolved

tianshou/data/collector.py Outdated Show resolved Hide resolved

bordeauxred and others added 25 commits March 7, 2024 11:31

Removing cur_rollout_batch. WIP

17d7e8a

Renamings of vars in collect

e9a3278

minor rename

c6a707e

Adjusted return type in BasePolicy.forward

47bfa8c

More var renamings in collector

a5b3601

Enhanced comments in collector

6363af8

Formatting

7b37eb1

Collector bugfixes: pass info to obs batch, don't use len on an int

49d5648

Collector bugfix: move breakout of the loop to after updating collect…

bad9696

…ions

Collector bugfix: missing () around walrus

db5b9a3

Still an off-by-one error somewhere

Renaming of env and policy used in test, added docstrings

d65d80d

Tests: removed useless repetition of test_collector

4641962

Tests: removed more useless repetitions in test_collector

7db9fca

Trainer: more control over buffer resetting, fixed call to iter

0282b81

Previously the call to iter would reset the buffer, but some tests and scripts rely on collecting transitions prior to looping over the trainer

Minor, aesthetic

fdede35

Collector: fix persistence between collect iterations, remove reset f…

7b041ad

…or n_episodes

Fixed collector test by reinstating data mutation AFTER ADDING TO BUF…

0fa5148

…FER. IMPORTANT - this needs fixing!

Aesthetic

900df81

Collector: fixed bug introduced in refactoring in persistence of _pre…

b2c43ec

…_collect_obs

Rename variable, fix mixup in collectory.collect docstring.

571ba45

Rename variables in AsyncCollector

cf58c7c

Rename id to env_ids in venv; reformat

32e8a34

fix typo and unnecessary mypy type ignore

c84d459

extract compute_action_policy_hidden in AsyncCollector

c4a0bec

Michael Panchenko added 8 commits March 25, 2024 17:19

Simplified nullable_slice (no need for @overload)

8757d65

Simplified nullable_slice (no need for @overload)

09cb1c6

Fixed iter for empty batch

bff8e7d

Renamed env_ids back to env_id, added a TODO

92a7467

Fixed iter in batch

0cde27b

Fixed test vectorenv

3f305ab

Protocols: removed accidental method implementation (copy-paste error)

5470a6c

Removed runtime checkable of ObsBatchProtocol

c04df12

MischaPanch reviewed Mar 26, 2024

View reviewed changes

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

Changelog [skip ci]

7e77d08

MischaPanch added the refactoring No change to functionality label Mar 26, 2024

MischaPanch assigned MischaPanch and bordeauxred Mar 26, 2024

MischaPanch approved these changes Mar 26, 2024

View reviewed changes

MischaPanch marked this pull request as ready for review March 26, 2024 11:18

MischaPanch requested a review from Trinkle23897 March 26, 2024 11:18

Refactor: replace overload in exploration_noise by generic

4f47f23

MischaPanch mentioned this pull request Mar 26, 2024

Add option to collect same number of episodes in each collector env #1046

Closed

8 tasks

dantp-ai mentioned this pull request Mar 28, 2024

Fix mypy issues in tests and examples #1077

Merged

MischaPanch merged commit 4f65b13 into thu-ml:master Mar 28, 2024
4 checks passed

bordeauxred deleted the feat/refactor_collector branch April 2, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/refactor collector #1063

Feat/refactor collector #1063

bordeauxred commented Feb 23, 2024 •

edited by MischaPanch

Loading

bordeauxred commented Feb 23, 2024

MischaPanch left a comment

MischaPanch commented Feb 23, 2024

MischaPanch left a comment

MischaPanch commented Mar 26, 2024 •

edited

Loading

MischaPanch commented Mar 28, 2024

Trinkle23897 commented Mar 31, 2024

Feat/refactor collector #1063

Feat/refactor collector #1063

Conversation

bordeauxred commented Feb 23, 2024 • edited by MischaPanch Loading

Api Extensions

Internal Improvements

Breaking Changes

bordeauxred commented Feb 23, 2024

MischaPanch left a comment

Choose a reason for hiding this comment

MischaPanch commented Feb 23, 2024

MischaPanch left a comment

Choose a reason for hiding this comment

MischaPanch commented Mar 26, 2024 • edited Loading

MischaPanch commented Mar 28, 2024

Trinkle23897 commented Mar 31, 2024

bordeauxred commented Feb 23, 2024 •

edited by MischaPanch

Loading

MischaPanch commented Mar 26, 2024 •

edited

Loading