You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 16, 2022. It is now read-only.
Hello, thank you for making this repo,
I think while calculating the returns you should take done into consideration as,
def calculate_returns(self, rewards, dones, normalize = True):
returns = []
R = 0
for r, d in zip(reversed(rewards), reversed(dones)):
if d:
R = 0
R = r + R * self.gamma
returns.insert(0, R)
returns = torch.tensor(returns).to(device)
if normalize:
returns = (returns - returns.mean()) / returns.std()
return returns
Also can you please briefly describe the Generalized Advantage Estimation (GAE) while calculating the advantages.
The text was updated successfully, but these errors were encountered:
Notebooks 1-7 all use Monte Carlo methods. That is each environment is run for a single episode, i.e. until the environment returns done = True, after which we then calculate the returns/advantages and update the policy parameters.
There is no need to check for done in the calculation of the returns/advantages as only the last state will have done = True, which is why R is initialized to zero.
I'll add the explanation to GAE when I get around to adding more detail to the notebooks - for now I'd recommend these two links:
Hello, thank you for making this repo,
I think while calculating the returns you should take done into consideration as,
Also can you please briefly describe the Generalized Advantage Estimation (GAE) while calculating the advantages.
The text was updated successfully, but these errors were encountered: