Skip to content
This repository has been archived by the owner on Oct 16, 2022. It is now read-only.

Taking 'done' into consideration while calculating returns #1

Open
murtazabasu opened this issue Dec 28, 2019 · 1 comment
Open

Taking 'done' into consideration while calculating returns #1

murtazabasu opened this issue Dec 28, 2019 · 1 comment

Comments

@murtazabasu
Copy link

Hello, thank you for making this repo,
I think while calculating the returns you should take done into consideration as,


    def calculate_returns(self, rewards, dones, normalize = True):
       
        returns = []
        R = 0
        for r, d in zip(reversed(rewards), reversed(dones)):    
            if d:
                R = 0
            R = r + R * self.gamma
            returns.insert(0, R)
            
        returns = torch.tensor(returns).to(device)
        
        if normalize:
            returns = (returns - returns.mean()) / returns.std()
            
        return returns

Also can you please briefly describe the Generalized Advantage Estimation (GAE) while calculating the advantages.

@bentrevett
Copy link
Owner

Notebooks 1-7 all use Monte Carlo methods. That is each environment is run for a single episode, i.e. until the environment returns done = True, after which we then calculate the returns/advantages and update the policy parameters.

There is no need to check for done in the calculation of the returns/advantages as only the last state will have done = True, which is why R is initialized to zero.

I'll add the explanation to GAE when I get around to adding more detail to the notebooks - for now I'd recommend these two links:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants