You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On line 204 we call the function self.transform_reward() which transforms the content of the reward array into the discounted reward, hope that clarifies
Hi there, thanks for sharing your code. I think there's an error on line 280 in main.py
critic_loss = self.critic.fit([obs], [reward], batch_size=BATCH_SIZE, shuffle=True, epochs=EPOCHS, verbose=False)
Shoudn't the critic be fitting to the discounted_returns instead of the rewards? That is the line should read
critic_loss = self.critic.fit([obs], [discounted_returns], batch_size=BATCH_SIZE, shuffle=True, epochs=EPOCHS, verbose=False)
The text was updated successfully, but these errors were encountered: