You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When rewards-to-go is implemented in the on policy ALGOBuffer.finish_path(), looks all necessary info has been recorded in the rew_buf. The last_val for rews seems just a placeholder because the action has been sent out to agent and the reward has been seen and recorded.
The last_val for value function totally makes sense for bootstraping but reward?
My thinking is the rewards-to-go might be this
When rewards-to-go is implemented in the on policy
ALGOBuffer.finish_path()
, looks all necessary info has been recorded in therew_buf
. Thelast_val
forrews
seems just a placeholder because the action has been sent out to agent and the reward has been seen and recorded.The
last_val
for value function totally makes sense for bootstraping but reward?My thinking is the rewards-to-go might be this
instead of
The text was updated successfully, but these errors were encountered: