Skip to content

Commit

Permalink
Prepare release 1.1.0 (#1200)
Browse files Browse the repository at this point in the history
Mainly adds changelog and docs, but also one breaking change of
`to_torch` and `to_numpy` interface
  • Loading branch information
MischaPanch authored Aug 10, 2024
2 parents f5d2ae6 + 007c3ca commit f4814c1
Show file tree
Hide file tree
Showing 6 changed files with 247 additions and 120 deletions.
311 changes: 214 additions & 97 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,133 +2,250 @@

## Release 1.1.0

### Highlights

#### Evaluation Package

This release introduces a new package `evaluation` that integrates best
practices for running experiments (seeding test and train environmets) and for
evaluating them using the [rliable](https://github.com/google-research/rliable)
library. This should be especially useful for algorithm developers for comparing
performances and creating meaningful visualizations. **This functionality is
currently in alpha state** and will be further improved in the next releases.
You will need to install tianshou with the extra `eval` to use it.

The creation of multiple experiments with varying random seeds has been greatly
facilitated. Moreover, the `ExpLauncher` interface has been introduced and
implemented with several backends to support the execution of multiple
experiments in parallel.

An example for this using the high-level interfaces can be found
[here](examples/mujoco/mujoco_ppo_hl_multi.py), examples that use low-level
interfaces will follow soon.

#### Improvements in Batch

Apart from that, several important
extensions have been added to internal data structures, most notably to `Batch`.
Batches now implement `__eq__` and can be meaningfully compared. Applying
operations in a nested fashion has been significantly simplified, and checking
for NaNs and dropping them is now possible.

One more notable change is that torch `Distribution` objects are now sliced when
slicing a batch. Previously, when a Batch with say 10 actions and a dist
corresponding to them was sliced to `[:3]`, the `dist` in the result would still
correspond to all 10 actions. Now, the dist is also "sliced" to be the
distribution of the first 3 actions.

A detailed list of changes can be found below.

### Changes/Improvements
- `evaluation`: New package for repeating the same experiment with multiple seeds and aggregating the results. #1074 #1141 #1183

- `evaluation`: New package for repeating the same experiment with multiple
seeds and aggregating the results. #1074 #1141 #1183
- `data`:
- `Batch`:
- Add methods `to_dict` and `to_list_of_dicts`. #1063 #1098
- Add methods `to_numpy_` and `to_torch_`. #1098, #1117
- Add `__eq__` (semantic equality check). #1098
- `keys()` deprecated in favor of `get_keys()` (needed to make iteration consistent with naming) #1105.
- Major: new methods for applying functions to values, to check for NaNs and drop them, and to set values. #1181
- Slicing a batch with a torch distribution now also slices the distribution. #1181
- `data.collector`:
- `Collector`:
- Introduced `BaseCollector` as a base class for all collectors. #1123
- Add method `close` #1063
- Method `reset` is now more granular (new flags controlling behavior). #1063
- `CollectStats`: Add convenience constructor `with_autogenerated_stats`. #1063
- `Batch`:
- Add methods `to_dict` and `to_list_of_dicts`. #1063 #1098
- Add methods `to_numpy_` and `to_torch_`. #1098, #1117
- Add `__eq__` (semantic equality check). #1098
- `keys()` deprecated in favor of `get_keys()` (needed to make iteration
consistent with naming) #1105.
- Major: new methods for applying functions to values, to check for NaNs
and drop them, and to set values. #1181
- Slicing a batch with a torch distribution now also slices the
distribution. #1181
- `data.collector`:
- `Collector`:
- Introduced `BaseCollector` as a base class for all collectors.
#1123
- Add method `close` #1063
- Method `reset` is now more granular (new flags controlling
behavior). #1063
- `CollectStats`: Add convenience
constructor `with_autogenerated_stats`. #1063
- `trainer`:
- Trainers can now control whether collectors should be reset prior to training. #1063
- policy:
- introduced attribute `in_training_step` that is controlled by the trainer. #1123
- policy automatically set to `eval` mode when collecting and to `train` mode when updating. #1123
- Extended interface of `compute_action` to also support array-like inputs #1169
- Trainers can now control whether collectors should be reset prior to
training. #1063
- `policy`:
- introduced attribute `in_training_step` that is controlled by the trainer.
#1123
- policy automatically set to `eval` mode when collecting and to `train`
mode when updating. #1123
- Extended interface of `compute_action` to also support array-like inputs
#1169
- `highlevel`:
- `SamplingConfig`:
- Add support for `batch_size=None`. #1077
- Add `training_seed` for explicit seeding of training and test environments, the `test_seed` is inferred from `training_seed`. #1074
- `experiment`:
- `Experiment` now has a `name` attribute, which can be set using `ExperimentBuilder.with_name` and
which determines the default run name and therefore the persistence subdirectory.
It can still be overridden in `Experiment.run()`, the new parameter name being `run_name` rather than
`experiment_name` (although the latter will still be interpreted correctly). #1074 #1131
- Add class `ExperimentCollection` for the convenient execution of multiple experiment runs #1131
- `ExperimentBuilder`:
- Add method `build_seeded_collection` for the sound creation of multiple
experiments with varying random seeds #1131
- Add method `copy` to facilitate the creation of multiple experiments from a single builder #1131
- `env`:
- Added new `VectorEnvType` called `SUBPROC_SHARED_MEM_AUTO` and used in for Atari and Mujoco venv creation. #1141
- Loggers can now restore the logged data into python by using the new `restore_logged_data` method. #1074
- Wandb logger extended #1183
- `utils`:
- `net.continuous.Critic`:
- Add flag `apply_preprocess_net_to_obs_only` to allow the
preprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we want
to reuse the actor's preprocessing network #1128
- `torch_utils` (new module)
- Added context managers `torch_train_mode` and `policy_within_training_step` #1123
- `print`
- `DataclassPPrintMixin` now supports outputting a string, not just printing the pretty repr. #1141
- `SamplingConfig`:
- Add support for `batch_size=None`. #1077
- Add `training_seed` for explicit seeding of training and test
environments, the `test_seed` is inferred from `training_seed`. #1074
- `experiment`:
- `Experiment` now has a `name` attribute, which can be set
using `ExperimentBuilder.with_name` and
which determines the default run name and therefore the persistence
subdirectory.
It can still be overridden in `Experiment.run()`, the new parameter
name being `run_name` rather than
`experiment_name` (although the latter will still be interpreted
correctly). #1074 #1131
- Add class `ExperimentCollection` for the convenient execution of
multiple experiment runs #1131
- `ExperimentBuilder`:
- Add method `build_seeded_collection` for the sound creation of
multiple
experiments with varying random seeds #1131
- Add method `copy` to facilitate the creation of multiple
experiments from a single builder #1131
- `env`:
- Added new `VectorEnvType` called `SUBPROC_SHARED_MEM_AUTO` and used in
for Atari and Mujoco venv creation. #1141
- `utils`:
- `logger`:
- Loggers can now restore the logged data into python by using the
new `restore_logged_data` method. #1074
- Wandb logger extended #1183
- `net.continuous.Critic`:
- Add flag `apply_preprocess_net_to_obs_only` to allow the
preprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we
want
to reuse the actor's preprocessing network #1128
- `torch_utils` (new module)
- Added context managers `torch_train_mode`
and `policy_within_training_step` #1123
- `print`
- `DataclassPPrintMixin` now supports outputting a string, not just
printing the pretty repr. #1141

### Fixes

- `highlevel`:
- `CriticFactoryReuseActor`: Enable the Critic flag `apply_preprocess_net_to_obs_only` for continuous critics,
fixing the case where we want to reuse an actor's preprocessing network for the critic (affects usages
of the experiment builder method `with_critic_factory_use_actor` with continuous environments) #1128
- Policy parameter `action_scaling` value `"default"` was not correctly transformed to a Boolean value for
algorithms SAC, DDPG, TD3 and REDQ. The value `"default"` being truthy caused action scaling to be enabled
even for discrete action spaces. #1191
- `CriticFactoryReuseActor`: Enable the Critic
flag `apply_preprocess_net_to_obs_only` for continuous critics,
fixing the case where we want to reuse an actor's preprocessing network
for the critic (affects usages
of the experiment builder method `with_critic_factory_use_actor` with
continuous environments) #1128
- Policy parameter `action_scaling` value `"default"` was not correctly
transformed to a Boolean value for
algorithms SAC, DDPG, TD3 and REDQ. The value `"default"` being truthy
caused action scaling to be enabled
even for discrete action spaces. #1191
- `atari_network.DQN`:
- Fix constructor input validation #1128
- Fix `output_dim` not being set if `features_only`=True and `output_dim_added_layer` is not None #1128
- Fix constructor input validation #1128
- Fix `output_dim` not being set if `features_only`=True
and `output_dim_added_layer` is not None #1128
- `PPOPolicy`:
- Fix `max_batchsize` not being used in `logp_old` computation inside `process_fn` #1168
- Fix `max_batchsize` not being used in `logp_old` computation
inside `process_fn` #1168
- Fix `Batch.__eq__` to allow comparing Batches with scalar array values #1185

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063
- Introduced a first iteration of a naming convention for vars in `Collector`s. #1063
- Generally improved readability of Collector code and associated tests (still quite some way to go). #1063

- `Collector`s rely less on state, the few stateful things are stored explicitly
instead of through a `.data` attribute. #1063
- Introduced a first iteration of a naming convention for vars in `Collector`s.
#1063
- Generally improved readability of Collector code and associated tests (still
quite some way to go). #1063
- Improved typing for `exploration_noise` and within Collector. #1063
- Better variable names related to model outputs (logits, dist input etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the distribution type. #1032
- Better variable names related to model outputs (logits, dist input etc.).
#1032
- Improved typing for actors and critics, using Tianshou classes
like `Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the presence
of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032
- Exception no longer raised on `len` of empty `Batch`. #1084
- tests and examples are covered by `mypy`. #1077
- `NetBase` is more used, stricter typing by making it generic. #1077
- Use explicit multiprocessing context for creating `Pipe` in `subproc.py`. #1102
- Use explicit multiprocessing context for creating `Pipe` in `subproc.py`.
#1102

### Breaking Changes

- `data`:
- `Collector`:
- Removed `.data` attribute. #1063
- Collectors no longer reset the environment on initialization.
Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . #1063
- Removed `no_grad` argument from `collect` method (was unused in tianshou). #1123
- `Batch`:
- Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`.
Can be considered a bugfix. #1063
- The methods `to_numpy` and `to_torch` in are not in-place anymore
(use `to_numpy_` or `to_torch_` instead). #1098, #1117
- The method `Batch.is_empty` has been removed. Instead, the user can simply check for emptiness of Batch by using `len` on dicts. #1144
- Stricter `cat_`, only concatenation of batches with the same structure is allowed. #1181
- Logging:
- `BaseLogger.prepare_dict_for_logging` is now abstract. #1074
- Removed deprecated and unused `BasicLogger` (only affects users who subclassed it). #1074
- VectorEnvs now return an array of info-dicts on reset instead of a list. #1063
- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both
continuous and discrete cases. #1032
- `Collector`:
- Removed `.data` attribute. #1063
- Collectors no longer reset the environment on initialization.
Instead, the user might have to call `reset` expicitly or
pass `reset_before_collect=True` . #1063
- Removed `no_grad` argument from `collect` method (was unused in
tianshou). #1123
- `Batch`:
- Fixed `iter(Batch(...)` which now behaves the same way
as `Batch(...).__iter__()`.
Can be considered a bugfix. #1063
- The methods `to_numpy` and `to_torch` in are not in-place anymore
(use `to_numpy_` or `to_torch_` instead). #1098, #1117
- The method `Batch.is_empty` has been removed. Instead, the user can
simply check for emptiness of Batch by using `len` on dicts. #1144
- Stricter `cat_`, only concatenation of batches with the same structure
is allowed. #1181
- `to_torch` and `to_numpy` are no longer static methods.
So `Batch.to_numpy(batch)` should be replaced by `batch.to_numpy()`.
#1200
- `utils`:
- Modules with code that was copied from sensAI have been replaced by imports from new dependency sensAI-utils:
- `tianshou.utils.logging` is replaced with `sensai.util.logging`
- `tianshou.utils.string` is replaced with `sensai.util.string`
- `tianshou.utils.pickle` is replaced with `sensai.util.pickle`
- `utils.net.common.Recurrent` now receives and returns a `RecurrentStateBatch` instead of a dict. #1077
- `AtariEnvFactory` constructor (in examples, so not really breaking) now requires explicit train and test seeds. #1074
- `EnvFactoryRegistered` now requires an explicit `test_seed` in the constructor. #1074
- `logger`:
- `BaseLogger.prepare_dict_for_logging` is now abstract. #1074
- Removed deprecated and unused `BasicLogger` (only affects users who
subclassed it). #1074
- `utils.net`:
- `Recurrent` now receives and returns
a `RecurrentStateBatch` instead of a dict. #1077
- Modules with code that was copied from sensAI have been replaced by
imports from new dependency sensAI-utils:
- `tianshou.utils.logging` is replaced with `sensai.util.logging`
- `tianshou.utils.string` is replaced with `sensai.util.string`
- `tianshou.utils.pickle` is replaced with `sensai.util.pickle`
- `env`:
- All VectorEnvs now return a numpy array of info-dicts on reset instead of
a list. #1063
- `policy`:
- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a
single argument in both
continuous and discrete cases. #1032
- `AtariEnvFactory` constructor (in examples, so not really breaking) now
requires explicit train and test seeds. #1074
- `EnvFactoryRegistered` now requires an explicit `test_seed` in the
constructor. #1074
- `highlevel`:
- The parameter `dist_fn` has been removed from the parameter objects (`PGParams`, `A2CParams`, `PPOParams`, `NPGParams`, `TRPOParams`).
The correct distribution is now determined automatically based on the actor factory being used, avoiding the possibility of
misspecification. Persisted configurations/policies continue to work as expected, but code must not specify the `dist_fn` parameter.
#1194 #1195

- `params`: The parameter `dist_fn` has been removed from the parameter
objects (`PGParams`, `A2CParams`, `PPOParams`, `NPGParams`, `TRPOParams`).
The correct distribution is now determined automatically based on the
actor factory being used, avoiding the possibility of
misspecification. Persisted configurations/policies continue to work as
expected, but code must not specify the `dist_fn` parameter.
#1194 #1195
- `env`:
- `EnvFactoryRegistered`: parameter `seed` has been replaced by the pair
of parameters `train_seed` and `test_seed`
Persisted instances will continue to work correctly.
Subclasses such as `AtariEnvFactory` are also affected requires
explicit train and test seeds. #1074
- `VectorEnvType`: `SUBPROC_SHARED_MEM` has been replaced
by `SUBPROC_SHARED_MEM_DEFAULT`. It is recommended to
use `SUBPROC_SHARED_MEM_AUTO` instead. However, persisted configs will
continue working. #1141

### Tests
- Fixed env seeding it `test_sac_with_il.py` so that the test doesn't fail randomly. #1081

- Fixed env seeding it `test_sac_with_il.py` so that the test doesn't fail
randomly. #1081
- Improved CI triggers and added telemetry (if requested by user) #1177
- Improved environment used in tests.
- Improved tests bach equality to check with scalar values #1185

### Dependencies
- [DeepDiff](https://github.com/seperman/deepdiff) added to help with diffs of batches in tests. #1098

- [DeepDiff](https://github.com/seperman/deepdiff) added to help with diffs of
batches in tests. #1098
- Bumped black, idna, pillow
- New extra "eval"
- Bumped numba to >=60.0.0, permitting installation on python 3.12 # 1177
- New dependency sensai-utils

Started after v1.0.0
12 changes: 6 additions & 6 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f4814c1

Please sign in to comment.