Skip to content

Commit

Permalink
Typo docstring (thu-ml#1132)
Browse files Browse the repository at this point in the history
  • Loading branch information
bordeauxred authored May 1, 2024
1 parent 61426ac commit f31a91d
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions tianshou/policy/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,20 +556,20 @@ def compute_episodic_return(
advantage + value, which is exactly equivalent to using :math:`TD(\lambda)`
for estimating returns.
Setting v_s_ and v_s to None (or all zeros) and gae_lambda to 1.0 calculates the
Setting `v_s_` and `v_s` to None (or all zeros) and `gae_lambda` to 1.0 calculates the
discounted return-to-go/ Monte-Carlo return.
:param batch: a data batch which contains several episodes of data in
sequential order. Mind that the end of each finished episode of batch
should be marked by done flag, unfinished (or collecting) episodes will be
recognized by buffer.unfinished_index().
:param buffer: the corresponding replay buffer.
:param numpy.ndarray indices: tell batch's location in buffer, batch is equal
:param indices: tells the batch's location in buffer, batch is equal
to buffer[indices].
:param np.ndarray v_s_: the value function of all next states :math:`V(s')`.
:param v_s_: the value function of all next states :math:`V(s')`.
If None, it will be set to an array of 0.
:param v_s: the value function of all current states :math:`V(s)`. If None,
it is set based upon v_s_ rolled by 1.
it is set based upon `v_s_` rolled by 1.
:param gamma: the discount factor, should be in [0, 1].
:param gae_lambda: the parameter for Generalized Advantage Estimation,
should be in [0, 1].
Expand Down

0 comments on commit f31a91d

Please sign in to comment.