Rewards-to-Go issue #433

Raymondliz · 2024-12-30T13:37:59Z

When rewards-to-go is implemented in the on policy ALGOBuffer.finish_path(), looks all necessary info has been recorded in the rew_buf. The last_val for rews seems just a placeholder because the action has been sent out to agent and the reward has been seen and recorded.
The last_val for value function totally makes sense for bootstraping but reward?
My thinking is the rewards-to-go might be this

self.ret_buf[path_slice] = core.discount_cumsum(rews[:-1], self.gamma)

instead of

self.ret_buf[path_slice] = core.discount_cumsum(rews, self.gamma)[:-1]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewards-to-Go issue #433

Rewards-to-Go issue #433

Raymondliz commented Dec 30, 2024

Rewards-to-Go issue #433

Rewards-to-Go issue #433

Comments

Raymondliz commented Dec 30, 2024