Skip to content

Commit

Permalink
Feat/refactor collector (thu-ml#1063)
Browse files Browse the repository at this point in the history
Closes: thu-ml#1058

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <[email protected]>
  • Loading branch information
2 people authored and ZhengLi1314 committed Apr 15, 2024
1 parent d4a4196 commit 6f46f2d
Show file tree
Hide file tree
Showing 6 changed files with 157 additions and 164 deletions.
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,27 @@
# Changelog

## Release 1.1.0

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063
- `Collector`s can now be closed, and their reset is more granular. #1063
- Trainers can control whether collectors should be reset prior to training. #1063
- Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063
- Introduced a first iteration of a naming convention for vars in `Collector`s. #1063
- Generally improved readability of Collector code and associated tests (still quite some way to go). #1063
- Improved typing for `exploration_noise` and within Collector. #1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes. #1063
- Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . #1063
- VectorEnvs now return an array of info-dicts on reset instead of a list. #1063
- Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063


Started after v1.0.0

6 changes: 3 additions & 3 deletions test/base/test_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from test.base.env import MoveToRightEnv, MyGoalEnv


def test_replaybuffer(size: int = 10, bufsize: int = 20) -> None:
def test_replaybuffer(size=10, bufsize=20) -> None:
env = MoveToRightEnv(size)
buf = ReplayBuffer(bufsize)
buf.update(buf)
Expand Down Expand Up @@ -218,7 +218,7 @@ def test_ignore_obs_next(size: int = 10) -> None:
assert data.obs_next


def test_stack(size: int = 5, bufsize: int = 9, stack_num: int = 4, cached_num: int = 3) -> None:
def test_stack(size=5, bufsize=9, stack_num=4, cached_num=3) -> None:
env = MoveToRightEnv(size)
buf = ReplayBuffer(bufsize, stack_num=stack_num)
buf2 = ReplayBuffer(bufsize, stack_num=stack_num, sample_avail=True)
Expand Down Expand Up @@ -289,7 +289,7 @@ def test_stack(size: int = 5, bufsize: int = 9, stack_num: int = 4, cached_num:
buf[bufsize * 2]


def test_priortized_replaybuffer(size: int = 32, bufsize: int = 15) -> None:
def test_priortized_replaybuffer(size=32, bufsize=15) -> None:
env = MoveToRightEnv(size)
buf = PrioritizedReplayBuffer(bufsize, 0.5, 0.5)
buf2 = PrioritizedVectorReplayBuffer(bufsize, buffer_num=3, alpha=0.5, beta=0.5)
Expand Down
Loading

0 comments on commit 6f46f2d

Please sign in to comment.