-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/extension collector buffer #1196
Conversation
1. Support for evaluating training runs 2. Improved handling of figures and axes 3. Allow passing max_env_step 4. Use min len of all experiments (bugfix, previously it would crash if experiments had different lengths)
Minor improvements in typing
…rings Note: the new config option will be used in follow-up commits
…or-buffer # Conflicts: # docs/spelling_wordlist.txt # tianshou/evaluation/rliable_evaluation_hl.py # tianshou/highlevel/logger.py
Extensions: - new property `subbuffer_edges` in normal and vectorized buffer - forwarded set_array_at_key, hasnull, isnull and dropnull from Batch - added last_index creation to init of `BufferManager` Breaking: - Better input validation and checks for malformed buffer Non-functional: Many renamings, comments, docstrings and TODOs
Previously the advantages were normalized multiple times if `repeat` is set to more than 1 Also minor improvement and extension of PPOTrainingStats
1. improved logging 2. extended resetting possibilities 3. collect stats for n_episodes 4. raise error on NaNs in buffer Added some comments and TODOs
1. Added support for Step and Episode hooks. The latter are particularly tricky. 2. Refactoring: collect results of action computation in batch instead of in tuple 3. Enhanced input validation 4. Better variable names Also a bunch of comments and todos
# TODO 1: this is only here because of atari, it should never be needed (can be solved with index) | ||
# and should be removed | ||
# TODO 2: does something entirely different from getitem | ||
# TODO 3: key should not be required | ||
stack_num: int | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack_num
can potentially be used for RNNs. I think I saw this note also in one of the tutorials. But if there is no big support planned for RNNs on the horizon, it can probably be removed.
If the environment already stacks frames (e.g. using a `FrameStack` wrapper), this should either not | ||
be used or should be used in conjunction with :attr:`replay_buffer_save_only_last_obs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the usage recommendation concerning the related parameter dropped?
Not clearly an improvement over the old docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to recommend to use FrameStack instead of using this option, previously this recommendation was not clear. I will restore the "in conjunction with :attr:replay_buffer_save_only_last_obs
." part though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, pls resolve if you agree with the new formulation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1196 +/- ##
==========================================
- Coverage 85.50% 85.08% -0.42%
==========================================
Files 102 102
Lines 8649 8878 +229
==========================================
+ Hits 7395 7554 +159
- Misses 1254 1324 +70
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Also removed some unnecessary indirections
…or-buffer # Conflicts: # tianshou/highlevel/config.py
Regarding the collector updates, did you base them on the changes in the imac branch only? Or did you also consider the approach I made in the imac-with-buffer branch? There, I already made some efforts of moving the buffer related parts to the buffer and also moved some of the logic outside the Collector. |
@maxhuettenrauch I thought I did base it on the imac-with-buffer branch, but now realized that I was wrong. Will adjust the collector-buffer things accordingly, your design is indeed better than this one (at least after a quick look)! |
…ts are computed The Collector had to become generic to enable this Moved some stats computation to BaseCollector Extended the hook setting/getting and added tests for hooks Added a lot of documentation on the collect method
I had a closer look at your implementation in the work on imac. I understood that you needed to customize the CollectStats computation but I think it's not a good idea to couple stats with the buffer. Apart from complicating interactions between various concepts, the collect stats should be info from the collect iteration and not from the whole buffer. I am also not a fan of dynamically defined classes, I consider them an antipattern and they add a lot of complexity (as you have seen yourself when giving appropriate names). I also considered bundling the callbacks into a single class, but in the end decided not to do that. It's an additional indirection and an additional class, adding complexity to the user and I don't think it adds much value. The hooks are a bit complicated to understand and the documentation of them and access to them from the collector should be as visible as possible. So I think leaving them as explicit kwargs and with explicit getters/setters is better. In order to permit the customization you required I instead moved the logic of stats computation and updates to the To compensate for the increased complexity in For me, this PR is ready to be merged. Pls have a look at the changes and see whether you agree or have comments. After this is merged, we should extend the highlevel interfaces to include the possibility to customize stats. This PR also presents a large step towards customized logging. The main additional change is in 64cdf14. To make mypy happy the generic had to be specified wherever Collector is instantiated, so most files in the project were affected, polluting the diff |
…or-buffer # Conflicts: # docs/02_notebooks/L6_Trainer.ipynb # tianshou/highlevel/experiment.py
After this PR and the corresponding extension of HL interfaces are through, I'd release the new version of tianshou, as this presents useful additions to the functionality and customizability of the library |
Integration to HL Interfaces will follow, some changes to the design might happen then. But this gives a viable state to build on. Since it's mainly an extension, and was thoroughly tested, I went in for the merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor point, overall lgtm
@@ -122,12 +122,12 @@ def dist(loc_scale: tuple[torch.Tensor, torch.Tensor]) -> Distribution: | |||
action_space=env.action_space, | |||
) | |||
# collector | |||
train_collector = Collector( | |||
train_collector = Collector[CollectStats]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit: can we make it as default setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't - it's a bug in mypy. It already is the default setting (as the bound in the Generic is set to CollectStats), but mypy will still complain. Until the issue is fixed in numpy, the smaller evil is to specify the type in our codebase.
Users don't have to do this and will get proper autocompletion without this. Also, I think pylance doesn't have this bug, but well, we're using mypy in CI for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we replace mypy with pyright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I was also thinking about that for a while. It wouldn't be super smooth though, pyright is stricter and there will probably be new typing errors.
For now, my pain with mypy is not enough to make me want to implement the switch. If you or anyone from the community want to do that, feel free! I essentially don't care as long as we have some type-checker in CI
@@ -278,6 +278,13 @@ def get_sliced_dist(dist: TDistribution, index: IndexType) -> TDistribution: | |||
raise NotImplementedError(f"Unsupported distribution for slicing: {dist}") | |||
|
|||
|
|||
def get_len_of_dist(dist: Distribution) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why do you put dist util function under buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a bunch of dist utils that are scattered around collector, batch and buffer at the moment, and we don't have a dedicated module for them yet.
I wanted to make one and also a dedicated test module in a separate PR soon (before the next release). Until then, it was just a sort of random decision where to put them
Internal improvements post #1196
Adds important functionality to buffer and collector. The PR is very large but I didn't want to split it up. It's easiest to review commit by commit, and I think various people should have a look. One can also look file by file. Together we can do this ;)
I'll edit the description when the review is done
@Trinkle23897: pls manly have a look at the the changes in buffer related things, and if you want also in the computation of n_step return. I had to slightly modify one of the tests that was changing the private
_insertion_index
leading to a malformed buffer, which now raises an error. Ofc you are very welcome to look at the rest as well :)@opcode81 and @maxhuettenrauch : pls have a look at the extensions in Collector. They are untested for now, wanted to get your opinion on the design first. Also, a quick glance at the trainer would be nice
Ah, also @Trinkle23897: I think I found a bug in the PPO implementation, see corresponding commit
@dantp-ai : the changes to the buffer here will make the task of fixing slicing issues easier, especially the new names and additional comments. Would also be happy about your review, if you have time!