Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/extension collector buffer #1196

Merged
merged 39 commits into from
Aug 20, 2024

Conversation

MischaPanch
Copy link
Collaborator

@MischaPanch MischaPanch commented Aug 8, 2024

Adds important functionality to buffer and collector. The PR is very large but I didn't want to split it up. It's easiest to review commit by commit, and I think various people should have a look. One can also look file by file. Together we can do this ;)

I'll edit the description when the review is done

@Trinkle23897: pls manly have a look at the the changes in buffer related things, and if you want also in the computation of n_step return. I had to slightly modify one of the tests that was changing the private _insertion_index leading to a malformed buffer, which now raises an error. Ofc you are very welcome to look at the rest as well :)

@opcode81 and @maxhuettenrauch : pls have a look at the extensions in Collector. They are untested for now, wanted to get your opinion on the design first. Also, a quick glance at the trainer would be nice

Ah, also @Trinkle23897: I think I found a bug in the PPO implementation, see corresponding commit

@dantp-ai : the changes to the buffer here will make the task of fixing slicing issues easier, especially the new names and additional comments. Would also be happy about your review, if you have time!

Michael Panchenko and others added 20 commits August 1, 2024 18:06
1. Support for evaluating training runs
2. Improved handling of figures and axes
3. Allow passing max_env_step
4. Use min len of all experiments (bugfix, previously it would crash if experiments had different lengths)
…rings

Note: the new config option will be used in follow-up commits
…or-buffer

# Conflicts:
#	docs/spelling_wordlist.txt
#	tianshou/evaluation/rliable_evaluation_hl.py
#	tianshou/highlevel/logger.py
Extensions:
- new property `subbuffer_edges` in normal and vectorized buffer
- forwarded set_array_at_key, hasnull, isnull and dropnull from Batch
- added last_index creation to init of `BufferManager`

Breaking:
- Better input validation and checks for malformed buffer

Non-functional:

Many renamings, comments, docstrings and TODOs
Previously the advantages were normalized multiple times if `repeat` is set to more than 1

Also minor improvement and extension of PPOTrainingStats
1. improved logging
2. extended resetting possibilities
3. collect stats for n_episodes
4. raise error on NaNs in buffer

Added some comments and TODOs
1. Added support for Step and Episode hooks. The latter are particularly tricky.
2. Refactoring: collect results of action computation in batch instead of in tuple
3. Enhanced input validation
4. Better variable names

Also a bunch of comments and todos
# TODO 1: this is only here because of atari, it should never be needed (can be solved with index)
# and should be removed
# TODO 2: does something entirely different from getitem
# TODO 3: key should not be required
stack_num: int | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack_num can potentially be used for RNNs. I think I saw this note also in one of the tutorials. But if there is no big support planned for RNNs on the horizon, it can probably be removed.

tianshou/data/buffer/base.py Show resolved Hide resolved
Comment on lines -135 to -136
If the environment already stacks frames (e.g. using a `FrameStack` wrapper), this should either not
be used or should be used in conjunction with :attr:`replay_buffer_save_only_last_obs`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the usage recommendation concerning the related parameter dropped?
Not clearly an improvement over the old docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to recommend to use FrameStack instead of using this option, previously this recommendation was not clear. I will restore the "in conjunction with :attr:replay_buffer_save_only_last_obs." part though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, pls resolve if you agree with the new formulation

@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 80.05540% with 72 lines in your changes missing coverage. Please review.

Project coverage is 85.08%. Comparing base (f5d2ae6) to head (56e0b3c).
Report is 11 commits behind head on master.

Files Patch % Lines
tianshou/data/collector.py 75.00% 36 Missing ⚠️
tianshou/policy/base.py 50.00% 23 Missing ⚠️
tianshou/data/buffer/base.py 94.73% 3 Missing ⚠️
tianshou/trainer/base.py 90.00% 3 Missing ⚠️
tianshou/policy/modelfree/ppo.py 88.88% 2 Missing ⚠️
tianshou/utils/torch_utils.py 88.88% 2 Missing ⚠️
tianshou/data/buffer/manager.py 96.29% 1 Missing ⚠️
tianshou/highlevel/config.py 90.90% 1 Missing ⚠️
tianshou/highlevel/params/lr_scheduler.py 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1196      +/-   ##
==========================================
- Coverage   85.50%   85.08%   -0.42%     
==========================================
  Files         102      102              
  Lines        8649     8878     +229     
==========================================
+ Hits         7395     7554     +159     
- Misses       1254     1324      +70     
Flag Coverage Δ
unittests 85.08% <80.05%> (-0.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxhuettenrauch
Copy link
Collaborator

Regarding the collector updates, did you base them on the changes in the imac branch only? Or did you also consider the approach I made in the imac-with-buffer branch? There, I already made some efforts of moving the buffer related parts to the buffer and also moved some of the logic outside the Collector.

@MischaPanch
Copy link
Collaborator Author

MischaPanch commented Aug 13, 2024

Regarding the collector updates, did you base them on the changes in the imac branch only? Or did you also consider the approach I made in the imac-with-buffer branch? There, I already made some efforts of moving the buffer related parts to the buffer and also moved some of the logic outside the Collector.

@maxhuettenrauch I thought I did base it on the imac-with-buffer branch, but now realized that I was wrong. Will adjust the collector-buffer things accordingly, your design is indeed better than this one (at least after a quick look)!

Michael Panchenko added 6 commits August 18, 2024 12:00
…ts are computed

The Collector had to become generic to enable this
Moved some stats computation to BaseCollector
Extended the hook setting/getting and added tests for hooks
Added a lot of documentation on the collect method
@MischaPanch
Copy link
Collaborator Author

@maxhuettenrauch

I had a closer look at your implementation in the work on imac. I understood that you needed to customize the CollectStats computation but I think it's not a good idea to couple stats with the buffer. Apart from complicating interactions between various concepts, the collect stats should be info from the collect iteration and not from the whole buffer. I am also not a fan of dynamically defined classes, I consider them an antipattern and they add a lot of complexity (as you have seen yourself when giving appropriate names).

I also considered bundling the callbacks into a single class, but in the end decided not to do that. It's an additional indirection and an additional class, adding complexity to the user and I don't think it adds much value. The hooks are a bit complicated to understand and the documentation of them and access to them from the collector should be as visible as possible. So I think leaving them as explicit kwargs and with explicit getters/setters is better.

In order to permit the customization you required I instead moved the logic of stats computation and updates to the CollectStats class itself and made collector generic. So, I did implement your change on the Collector receiving a CollecStats constructor. The genericness allows for static analysis with customized stats.

To compensate for the increased complexity in _collect I have added very detailed documentation that helps understanding and navigating the code. I also added tests now.

For me, this PR is ready to be merged. Pls have a look at the changes and see whether you agree or have comments.

After this is merged, we should extend the highlevel interfaces to include the possibility to customize stats.

This PR also presents a large step towards customized logging.

The main additional change is in 64cdf14. To make mypy happy the generic had to be specified wherever Collector is instantiated, so most files in the project were affected, polluting the diff

@opcode81 @bordeauxred FYI

Michael Panchenko added 3 commits August 18, 2024 17:30
…or-buffer

# Conflicts:
#	docs/02_notebooks/L6_Trainer.ipynb
#	tianshou/highlevel/experiment.py
@MischaPanch
Copy link
Collaborator Author

After this PR and the corresponding extension of HL interfaces are through, I'd release the new version of tianshou, as this presents useful additions to the functionality and customizability of the library

tianshou/data/collector.py Show resolved Hide resolved
tianshou/data/collector.py Show resolved Hide resolved
tianshou/data/collector.py Show resolved Hide resolved
@MischaPanch MischaPanch merged commit 002ffd9 into master Aug 20, 2024
@MischaPanch MischaPanch deleted the feature/extension-collector-buffer branch August 20, 2024 15:55
@MischaPanch
Copy link
Collaborator Author

Integration to HL Interfaces will follow, some changes to the design might happen then. But this gives a viable state to build on.

Since it's mainly an extension, and was thoroughly tested, I went in for the merge

Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor point, overall lgtm

@@ -122,12 +122,12 @@ def dist(loc_scale: tuple[torch.Tensor, torch.Tensor]) -> Distribution:
action_space=env.action_space,
)
# collector
train_collector = Collector(
train_collector = Collector[CollectStats](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: can we make it as default setting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can't - it's a bug in mypy. It already is the default setting (as the bound in the Generic is set to CollectStats), but mypy will still complain. Until the issue is fixed in numpy, the smaller evil is to specify the type in our codebase.

Users don't have to do this and will get proper autocompletion without this. Also, I think pylance doesn't have this bug, but well, we're using mypy in CI for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we replace mypy with pyright?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I was also thinking about that for a while. It wouldn't be super smooth though, pyright is stricter and there will probably be new typing errors.

For now, my pain with mypy is not enough to make me want to implement the switch. If you or anyone from the community want to do that, feel free! I essentially don't care as long as we have some type-checker in CI

@@ -278,6 +278,13 @@ def get_sliced_dist(dist: TDistribution, index: IndexType) -> TDistribution:
raise NotImplementedError(f"Unsupported distribution for slicing: {dist}")


def get_len_of_dist(dist: Distribution) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do you put dist util function under buffer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a bunch of dist utils that are scattered around collector, batch and buffer at the moment, and we don't have a dedicated module for them yet.

I wanted to make one and also a dedicated test module in a separate PR soon (before the next release). Until then, it was just a sort of random decision where to put them

MischaPanch added a commit that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants