Add option to collect same number of episodes in each collector env #1046

bordeauxred · 2024-02-07T16:34:40Z

I have added the correct label(s) to this Pull Request or linked the relevant issue(s)
I have provided a description of the changes in this Pull Request
I have added documentation for my changes
If applicable, I have added tests to cover my changes.
I have reformatted the code using poe format
I have checked style and types with poe lint and poe type-check
(Optional) I ran tests locally with poe test
(or a subset of them with poe test-reduced) ,and they pass
(Optional) I have tested that documentation builds correctly with poe doc-build

…each env in collector

bordeauxred · 2024-02-07T16:37:21Z

@MischaPanch

tianshou/data/collector.py

MischaPanch · 2024-02-08T12:30:59Z

tianshou/data/collector.py

@@ -476,6 +526,9 @@ def collect(

        :return: A dataclass object
        """
+        assert (


Why should we allow a param if it can't actually be changed?

…tside of init

…t recording worker wise returns!

tianshou/data/buffer/base.py

MischaPanch · 2024-02-16T16:00:11Z

tianshou/data/buffer/manager.py

            self._lengths[buffer_id] = len(self.buffers[buffer_id])
-        ep_last_idxs = np.array(ep_last_idxs)
+        ep_add_at_idxs = np.array(ep_add_at_idxs)


what is the semantics of this variable? The docstring calls it the current index. If it's not the ep_last_idxs, what does it mean?

The index at which the transition is added to the buffer.
As there is ep_start_idx that indicates the first transition of the current episode, ep_last_idx should be the index of the last transition in the episode. Whenever the current transition does not contain done, this is not the last index of the episode (as it continues) but the index at which to add the current transition.

Marking for discussion in pair programming

tianshou/data/buffer/manager.py

MischaPanch · 2024-02-19T10:21:54Z

tianshou/data/collector.py

+            collect_time=collect_call_duration,
+            collect_speed=step_count / collect_call_duration,
+            returns=np.array(episode_returns),
+            returns_stat=SequenceSummaryStats.from_sequence(episode_returns)


Let's move the declaration of this and lens_stats above the instantiation of the object, a bit easier to read

I'm also wondering whether this could be done in the CollectStatsBase itself in post_init, let's have a look together later

MischaPanch · 2024-02-19T10:23:56Z

tianshou/data/collector.py

+    def reset_env(
+        self,
+        gym_reset_kwargs: dict[str, Any] | None = None,
+        set_obs_next_to_obs: bool = False,


This is a non-trivial functionality, why is it needed and when would one want to use it? Pls add a docstring

tianshou/data/collector.py

MischaPanch · 2024-02-19T10:28:36Z

tianshou/data/collector.py


-            if (n_step and step_count >= n_step) or (n_episode and episode_count >= n_episode):
-                break
+                if n_episode:


Marking this block for discussion:

why gym_reset_kwargs used only in one place

factor out to separate method?

Avoid usage of np.where, instead use getitem on boolean array

MischaPanch · 2024-02-19T10:33:50Z

tianshou/data/collector.py

-            lens_stat=SequenceSummaryStats.from_sequence(episode_lens)
-            if len(episode_lens) > 0
-            else None,
+    def sample_at_least_one_episode_per_worker_postprocessing_on_done_env(


this doesn't seem to sample anything, rather only filter.

Marking for discussion

MischaPanch · 2024-02-19T10:34:11Z

tianshou/data/collector.py

+            self.data = self.data[mask]
+        return ready_env_ids
+
+    def sample_equal_episodes_per_worker_postprocessing_on_done_env(


same as above

MischaPanch · 2024-02-19T10:35:59Z

tianshou/data/collector.py

+            done=done,
+            info=info,
+        )
+        if self.preprocess_fn:


marking for discussion - when is this option used? Is it dangerous to update data twice?

MischaPanch · 2024-02-19T10:37:02Z

tianshou/data/collector.py

+            except TypeError:  # envpool's action space is not for per-env
+                act_sample = [self._action_space.sample() for _ in ready_env_ids]
+            act_sample = self.policy.map_action_inverse(act_sample)  # type: ignore
+            self.data.update(act=act_sample)


From the method name I wouldn't expect it to update data. Generally, data is modified inplace all over the class, we should avoid it where possible

MischaPanch

Overall, a much cleaner and more readable structure than before, but there are still some issues to address. Let's talk later today

MischaPanch · 2024-03-26T18:26:18Z

Closing this, @bordeauxred will make a new one after #1063 is merged

Add draft of optional argument to collect same number of episodes in …

5af267b

…each env in collector

run poe, fix all but one issue

b454254

MischaPanch requested changes Feb 8, 2024

View reviewed changes

Michael Panchenko and others added 7 commits February 8, 2024 15:40

Collector: factored out bits of collect method, removed set fields ou…

09cb23d

…tside of init

Fix error in reset to next, modularise collect further

167cbcb

Fix mypy

3c75ede

Add docstring to functions

dc8cd75

Continue refactor of collect of collector. Careful, this commit is no…

92317ae

…t recording worker wise returns!

Minor improvements to buffer

7aa05b8

Refactor collect to remove collect stats collector

f8fb430

MischaPanch reviewed Feb 16, 2024

View reviewed changes

Address Mischa's comments

b9c29ab

MischaPanch reviewed Feb 19, 2024

View reviewed changes

tianshou/data/buffer/manager.py Outdated Show resolved Hide resolved

MischaPanch reviewed Feb 19, 2024

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

MischaPanch reviewed Feb 19, 2024

View reviewed changes

MischaPanch mentioned this pull request Feb 20, 2024

Remove data from state in Collector, and remove preprocess_fn there #1058

Closed

bordeauxred added 2 commits February 21, 2024 10:56

Capitalise constants

f4f49f0

Refactor collect(). WIP!

725d103

MischaPanch closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to collect same number of episodes in each collector env #1046

Add option to collect same number of episodes in each collector env #1046

bordeauxred commented Feb 7, 2024

bordeauxred commented Feb 7, 2024

MischaPanch Feb 8, 2024

MischaPanch Feb 16, 2024

bordeauxred Feb 19, 2024

MischaPanch Feb 19, 2024

MischaPanch Feb 19, 2024

MischaPanch Feb 19, 2024

MischaPanch Feb 19, 2024 •

edited

Loading

MischaPanch Feb 19, 2024

MischaPanch Feb 19, 2024

MischaPanch Feb 19, 2024

MischaPanch Feb 19, 2024

MischaPanch left a comment

MischaPanch commented Mar 26, 2024

Add option to collect same number of episodes in each collector env #1046

Add option to collect same number of episodes in each collector env #1046

Conversation

bordeauxred commented Feb 7, 2024

bordeauxred commented Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MischaPanch Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MischaPanch left a comment

Choose a reason for hiding this comment

MischaPanch commented Mar 26, 2024

MischaPanch Feb 19, 2024 •

edited

Loading