Feature/extension collector buffer #1196

MischaPanch · 2024-08-08T16:31:02Z

Adds important functionality to buffer and collector. The PR is very large but I didn't want to split it up. It's easiest to review commit by commit, and I think various people should have a look. One can also look file by file. Together we can do this ;)

I'll edit the description when the review is done

@Trinkle23897: pls manly have a look at the the changes in buffer related things, and if you want also in the computation of n_step return. I had to slightly modify one of the tests that was changing the private _insertion_index leading to a malformed buffer, which now raises an error. Ofc you are very welcome to look at the rest as well :)

@opcode81 and @maxhuettenrauch : pls have a look at the extensions in Collector. They are untested for now, wanted to get your opinion on the design first. Also, a quick glance at the trainer would be nice

Ah, also @Trinkle23897: I think I found a bug in the PPO implementation, see corresponding commit

@dantp-ai : the changes to the buffer here will make the task of fixing slicing issues easier, especially the new names and additional comments. Would also be happy about your review, if you have time!

Fixup

1. Support for evaluating training runs 2. Improved handling of figures and axes 3. Allow passing max_env_step 4. Use min len of all experiments (bugfix, previously it would crash if experiments had different lengths)

Minor improvements in typing

…rings Note: the new config option will be used in follow-up commits

…or-buffer # Conflicts: # docs/spelling_wordlist.txt # tianshou/evaluation/rliable_evaluation_hl.py # tianshou/highlevel/logger.py

Extensions: - new property `subbuffer_edges` in normal and vectorized buffer - forwarded set_array_at_key, hasnull, isnull and dropnull from Batch - added last_index creation to init of `BufferManager` Breaking: - Better input validation and checks for malformed buffer Non-functional: Many renamings, comments, docstrings and TODOs

Previously the advantages were normalized multiple times if `repeat` is set to more than 1 Also minor improvement and extension of PPOTrainingStats

1. improved logging 2. extended resetting possibilities 3. collect stats for n_episodes 4. raise error on NaNs in buffer Added some comments and TODOs

1. Added support for Step and Episode hooks. The latter are particularly tricky. 2. Refactoring: collect results of action computation in batch instead of in tuple 3. Enhanced input validation 4. Better variable names Also a bunch of comments and todos

dantp-ai · 2024-08-08T19:37:51Z

tianshou/data/buffer/base.py

+        # TODO 1: this is only here because of atari, it should never be needed (can be solved with index)
+        #  and should be removed
+        # TODO 2: does something entirely different from getitem
+        # TODO 3: key should not be required
        stack_num: int | None = None,


stack_num can potentially be used for RNNs. I think I saw this note also in one of the tutorials. But if there is no big support planned for RNNs on the horizon, it can probably be removed.

tianshou/data/buffer/base.py

tianshou/highlevel/config.py

opcode81 · 2024-08-09T14:06:00Z

tianshou/highlevel/config.py

-    If the environment already stacks frames (e.g. using a `FrameStack` wrapper), this should either not
-    be used or should be used in conjunction with :attr:`replay_buffer_save_only_last_obs`.


Why was the usage recommendation concerning the related parameter dropped?
Not clearly an improvement over the old docstring.

I wanted to recommend to use FrameStack instead of using this option, previously this recommendation was not clear. I will restore the "in conjunction with :attr:replay_buffer_save_only_last_obs." part though

Done, pls resolve if you agree with the new formulation

codecov-commenter · 2024-08-09T17:45:09Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 80.05540% with 72 lines in your changes missing coverage. Please review.

Project coverage is 85.08%. Comparing base (f5d2ae6) to head (56e0b3c).
Report is 11 commits behind head on master.

Files	Patch %	Lines
tianshou/data/collector.py	75.00%	36 Missing ⚠️
tianshou/policy/base.py	50.00%	23 Missing ⚠️
tianshou/data/buffer/base.py	94.73%	3 Missing ⚠️
tianshou/trainer/base.py	90.00%	3 Missing ⚠️
tianshou/policy/modelfree/ppo.py	88.88%	2 Missing ⚠️
tianshou/utils/torch_utils.py	88.88%	2 Missing ⚠️
tianshou/data/buffer/manager.py	96.29%	1 Missing ⚠️
tianshou/highlevel/config.py	90.90%	1 Missing ⚠️
tianshou/highlevel/params/lr_scheduler.py	0.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1196      +/-   ##
==========================================
- Coverage   85.50%   85.08%   -0.42%     
==========================================
  Files         102      102              
  Lines        8649     8878     +229     
==========================================
+ Hits         7395     7554     +159     
- Misses       1254     1324      +70

Flag	Coverage Δ
unittests	`85.08% <80.05%> (-0.42%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Also removed some unnecessary indirections

…or-buffer # Conflicts: # tianshou/highlevel/config.py

maxhuettenrauch · 2024-08-13T12:45:47Z

Regarding the collector updates, did you base them on the changes in the imac branch only? Or did you also consider the approach I made in the imac-with-buffer branch? There, I already made some efforts of moving the buffer related parts to the buffer and also moved some of the logic outside the Collector.

MischaPanch · 2024-08-13T13:43:43Z

Regarding the collector updates, did you base them on the changes in the imac branch only? Or did you also consider the approach I made in the imac-with-buffer branch? There, I already made some efforts of moving the buffer related parts to the buffer and also moved some of the logic outside the Collector.

@maxhuettenrauch I thought I did base it on the imac-with-buffer branch, but now realized that I was wrong. Will adjust the collector-buffer things accordingly, your design is indeed better than this one (at least after a quick look)!

…make sense)

…ts are computed The Collector had to become generic to enable this Moved some stats computation to BaseCollector Extended the hook setting/getting and added tests for hooks Added a lot of documentation on the collect method

…it for mypi

MischaPanch · 2024-08-18T15:30:12Z

@maxhuettenrauch

I had a closer look at your implementation in the work on imac. I understood that you needed to customize the CollectStats computation but I think it's not a good idea to couple stats with the buffer. Apart from complicating interactions between various concepts, the collect stats should be info from the collect iteration and not from the whole buffer. I am also not a fan of dynamically defined classes, I consider them an antipattern and they add a lot of complexity (as you have seen yourself when giving appropriate names).

I also considered bundling the callbacks into a single class, but in the end decided not to do that. It's an additional indirection and an additional class, adding complexity to the user and I don't think it adds much value. The hooks are a bit complicated to understand and the documentation of them and access to them from the collector should be as visible as possible. So I think leaving them as explicit kwargs and with explicit getters/setters is better.

In order to permit the customization you required I instead moved the logic of stats computation and updates to the CollectStats class itself and made collector generic. So, I did implement your change on the Collector receiving a CollecStats constructor. The genericness allows for static analysis with customized stats.

To compensate for the increased complexity in _collect I have added very detailed documentation that helps understanding and navigating the code. I also added tests now.

For me, this PR is ready to be merged. Pls have a look at the changes and see whether you agree or have comments.

After this is merged, we should extend the highlevel interfaces to include the possibility to customize stats.

This PR also presents a large step towards customized logging.

The main additional change is in 64cdf14. To make mypy happy the generic had to be specified wherever Collector is instantiated, so most files in the project were affected, polluting the diff

@opcode81 @bordeauxred FYI

…or-buffer

…or-buffer # Conflicts: # docs/02_notebooks/L6_Trainer.ipynb # tianshou/highlevel/experiment.py

MischaPanch · 2024-08-18T15:42:20Z

After this PR and the corresponding extension of HL interfaces are through, I'd release the new version of tianshou, as this presents useful additions to the functionality and customizability of the library

tianshou/data/collector.py

MischaPanch · 2024-08-20T15:58:18Z

Integration to HL Interfaces will follow, some changes to the design might happen then. But this gives a viable state to build on.

Since it's mainly an extension, and was thoroughly tested, I went in for the merge

Trinkle23897

very minor point, overall lgtm

Trinkle23897 · 2024-08-31T17:24:44Z

test/continuous/test_ppo.py

@@ -122,12 +122,12 @@ def dist(loc_scale: tuple[torch.Tensor, torch.Tensor]) -> Distribution:
        action_space=env.action_space,
    )
    # collector
-    train_collector = Collector(
+    train_collector = Collector[CollectStats](


small nit: can we make it as default setting?

Unfortunately, we can't - it's a bug in mypy. It already is the default setting (as the bound in the Generic is set to CollectStats), but mypy will still complain. Until the issue is fixed in numpy, the smaller evil is to specify the type in our codebase.

Users don't have to do this and will get proper autocompletion without this. Also, I think pylance doesn't have this bug, but well, we're using mypy in CI for now

should we replace mypy with pyright?

Maybe, I was also thinking about that for a while. It wouldn't be super smooth though, pyright is stricter and there will probably be new typing errors.

For now, my pain with mypy is not enough to make me want to implement the switch. If you or anyone from the community want to do that, feel free! I essentially don't care as long as we have some type-checker in CI

Trinkle23897 · 2024-09-02T17:21:23Z

tianshou/data/batch.py

@@ -278,6 +278,13 @@ def get_sliced_dist(dist: TDistribution, index: IndexType) -> TDistribution:
        raise NotImplementedError(f"Unsupported distribution for slicing: {dist}")


+def get_len_of_dist(dist: Distribution) -> int:


nit: why do you put dist util function under buffer?

there's a bunch of dist utils that are scattered around collector, batch and buffer at the moment, and we don't have a dedicated module for them yet.

I wanted to make one and also a dedicated test module in a separate PR soon (before the next release). Until then, it was just a sort of random decision where to put them

Internal improvements post #1196

Michael Panchenko and others added 20 commits August 1, 2024 18:06

TrainingStats: fix for zero-len sequences, fixed an optional type

e1709f0

Fixup

Rliable eval: multiple extensions

e41deca

1. Support for evaluating training runs 2. Improved handling of figures and axes 3. Allow passing max_env_step 4. Use min len of all experiments (bugfix, previously it would crash if experiments had different lengths)

Added WandbLoggerFactory, made config_dict optional

9ceb041

Logging: made restore_logged_data static. Eval: better use of DataScope

c492765

Minor improvements in typing

Spelling [ci skip]

547b626

Merge branch 'master' into feature/enhanced-rliable-eval

bd58804

Minor typing and docstrings [ci skip]

f18f4a4

New util: create_uniform_action_dist

434606d

HL, config: option to collect n_episodes, post-init validation, docst…

0a3fc25

…rings Note: the new config option will be used in follow-up commits

n-step-return: better variable names, more docstrings and comments

a95a8b1

Merge branch 'refs/heads/thuml-master' into feature/extension-collect…

47a2a5a

…or-buffer # Conflicts: # docs/spelling_wordlist.txt # tianshou/evaluation/rliable_evaluation_hl.py # tianshou/highlevel/logger.py

Fixup returns

29bc77a

Test: minor comment

6abfefc

Minor formatting, typing

d9842e8

PPO, Changed implementation detail! Important!

3193d61

Previously the advantages were normalized multiple times if `repeat` is set to more than 1 Also minor improvement and extension of PPOTrainingStats

SAC: minor refactoring (extracted correct_log_prob_gaussian_tanh)

dfac7ad

Trainer: multiple small enhancements

e9d1b68

1. improved logging 2. extended resetting possibilities 3. collect stats for n_episodes 4. raise error on NaNs in buffer Added some comments and TODOs

Minor, typing

404134b

MischaPanch requested review from opcode81, Trinkle23897 and maxhuettenrauch August 8, 2024 16:31

MischaPanch and others added 2 commits August 8, 2024 18:31

Merge branch 'master' into feature/extension-collector-buffer

a9eda24

Imports, docs

d6e3d0a

dantp-ai reviewed Aug 8, 2024

View reviewed changes

opcode81 reviewed Aug 9, 2024

View reviewed changes

tianshou/highlevel/config.py Show resolved Hide resolved

opcode81 reviewed Aug 9, 2024

View reviewed changes

tianshou/highlevel/config.py Outdated Show resolved Hide resolved

opcode81 reviewed Aug 9, 2024

View reviewed changes

tianshou/highlevel/config.py Outdated Show resolved Hide resolved

opcode81 reviewed Aug 9, 2024

View reviewed changes

MischaPanch mentioned this pull request Aug 10, 2024

Handling LR scheduling when episodes instead of steps are collected #1198

Open

Michael Panchenko added 4 commits August 10, 2024 15:53

Collector: improved rollout hooks interfaces and docstrings

0bb8e9c

Also removed some unnecessary indirections

Removed no longer needed array module

4d81520

Aesthetic: docstrings, var names

90d3959

Merge branch 'refs/heads/thuml-master' into feature/extension-collect…

56e0b3c

…or-buffer # Conflicts: # tianshou/highlevel/config.py

Michael Panchenko added 6 commits August 18, 2024 12:00

Collector: use a larger default buffer size (previous default didn't …

f6134d2

…make sense)

Batch: possibility to get len of batches with dist. Added a test

872183e

Test env: aesthetic

8a0f536

Buffer: better names for vars (no functional change)

9997f8c

Typing: since Collector generic, CollectStats need to be passed at in…

58879cb

…it for mypi

Michael Panchenko added 3 commits August 18, 2024 17:30

Merge branch 'refs/heads/thuml-master' into feature/extension-collect…

a1453b8

…or-buffer

Merge branch 'refs/heads/thuml-master' into feature/extension-collect…

463d818

…or-buffer # Conflicts: # docs/02_notebooks/L6_Trainer.ipynb # tianshou/highlevel/experiment.py

nb-clean

5098d32

Typos [ci skip]

419f3c5

maxhuettenrauch reviewed Aug 20, 2024

View reviewed changes

tianshou/data/collector.py Show resolved Hide resolved

Michael Panchenko added 2 commits August 20, 2024 14:32

Collector, fixed omission: make use of raise_on_nan_in_buffer

0a65552

Block comment [ci skip]

bd58581

maxhuettenrauch reviewed Aug 20, 2024

View reviewed changes

tianshou/data/collector.py Show resolved Hide resolved

tianshou/data/collector.py Show resolved Hide resolved

tianshou/data/collector.py Show resolved Hide resolved

MischaPanch merged commit 002ffd9 into master Aug 20, 2024

MischaPanch deleted the feature/extension-collector-buffer branch August 20, 2024 15:55

MischaPanch mentioned this pull request Sep 2, 2024

Moved subbuffer-related functionality from Collector to Buffer #1214

Merged

Trinkle23897 reviewed Sep 2, 2024

View reviewed changes

MischaPanch added a commit that referenced this pull request Sep 2, 2024

Moved subbuffer-related functionality from Collector to Buffer (#1214)

16f2fc2

Internal improvements post #1196

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/extension collector buffer #1196

Feature/extension collector buffer #1196

MischaPanch commented Aug 8, 2024 •

edited

Loading

dantp-ai Aug 8, 2024

opcode81 Aug 9, 2024

MischaPanch Aug 9, 2024

MischaPanch Aug 10, 2024

codecov-commenter commented Aug 9, 2024 •

edited

Loading

maxhuettenrauch commented Aug 13, 2024

MischaPanch commented Aug 13, 2024 •

edited

Loading

MischaPanch commented Aug 18, 2024

MischaPanch commented Aug 18, 2024

MischaPanch commented Aug 20, 2024

Trinkle23897 left a comment

Trinkle23897 Aug 31, 2024

MischaPanch Sep 2, 2024

Trinkle23897 Sep 2, 2024

MischaPanch Sep 2, 2024

Trinkle23897 Sep 2, 2024

MischaPanch Sep 2, 2024

		If the environment already stacks frames (e.g. using a `FrameStack` wrapper), this should either not
		be used or should be used in conjunction with :attr:`replay_buffer_save_only_last_obs`.

		@@ -278,6 +278,13 @@ def get_sliced_dist(dist: TDistribution, index: IndexType) -> TDistribution:
		raise NotImplementedError(f"Unsupported distribution for slicing: {dist}")


		def get_len_of_dist(dist: Distribution) -> int:

Feature/extension collector buffer #1196

Feature/extension collector buffer #1196

Conversation

MischaPanch commented Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Aug 9, 2024 • edited Loading

Codecov Report

maxhuettenrauch commented Aug 13, 2024

MischaPanch commented Aug 13, 2024 • edited Loading

MischaPanch commented Aug 18, 2024

MischaPanch commented Aug 18, 2024

MischaPanch commented Aug 20, 2024

Trinkle23897 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MischaPanch commented Aug 8, 2024 •

edited

Loading

codecov-commenter commented Aug 9, 2024 •

edited

Loading

MischaPanch commented Aug 13, 2024 •

edited

Loading