Feat/refactor collector (thu-ml#1063)

Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <[email protected]>
ZhengLi1314 · Apr 15, 2024 · 6f46f2d · 6f46f2d
1 parent d4a4196
commit 6f46f2d
Show file tree

Hide file tree

Showing 6 changed files with 157 additions and 164 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,27 @@
 # Changelog
 
+## Release 1.1.0
+
+### Api Extensions
+- Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063
+- `Collector`s can now be closed, and their reset is more granular. #1063
+- Trainers can control whether collectors should be reset prior to training. #1063
+- Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063
+
+### Internal Improvements
+- `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063
+- Introduced a first iteration of a naming convention for vars in `Collector`s. #1063
+- Generally improved readability of Collector code and associated tests (still quite some way to go). #1063
+- Improved typing for `exploration_noise` and within Collector. #1063
+
+### Breaking Changes
+
+- Removed `.data` attribute from `Collector` and its child classes. #1063
+- Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` 
+expicitly or pass `reset_before_collect=True` . #1063
+- VectorEnvs now return an array of info-dicts on reset instead of a list. #1063
+- Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063
+
+
 Started after v1.0.0
 
diff --git a/test/base/test_buffer.py b/test/base/test_buffer.py
@@ -28,7 +28,7 @@
     from test.base.env import MoveToRightEnv, MyGoalEnv
 
 
-def test_replaybuffer(size: int = 10, bufsize: int = 20) -> None:
+def test_replaybuffer(size=10, bufsize=20) -> None:
     env = MoveToRightEnv(size)
     buf = ReplayBuffer(bufsize)
     buf.update(buf)
@@ -218,7 +218,7 @@ def test_ignore_obs_next(size: int = 10) -> None:
     assert data.obs_next
 
 
-def test_stack(size: int = 5, bufsize: int = 9, stack_num: int = 4, cached_num: int = 3) -> None:
+def test_stack(size=5, bufsize=9, stack_num=4, cached_num=3) -> None:
     env = MoveToRightEnv(size)
     buf = ReplayBuffer(bufsize, stack_num=stack_num)
     buf2 = ReplayBuffer(bufsize, stack_num=stack_num, sample_avail=True)
@@ -289,7 +289,7 @@ def test_stack(size: int = 5, bufsize: int = 9, stack_num: int = 4, cached_num:
         buf[bufsize * 2]
 
 
-def test_priortized_replaybuffer(size: int = 32, bufsize: int = 15) -> None:
+def test_priortized_replaybuffer(size=32, bufsize=15) -> None:
     env = MoveToRightEnv(size)
     buf = PrioritizedReplayBuffer(bufsize, 0.5, 0.5)
     buf2 = PrioritizedVectorReplayBuffer(bufsize, buffer_num=3, alpha=0.5, beta=0.5)