We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO代码实现中, 计算actor_loss额外增加了熵正则化项,我的理解是为了鼓励新策略进行更多的探索,也就是增加熵值, 但是代码中的实现似乎不一样。如下 # compute actor loss actor_loss = -torch.min(surr1, surr2).mean() + self.entropy_coef * dist.entropy().mean() 可预料的结果是:1. 为了最小化actor_loss -> 2. 最小化熵项self.entropy_coef * dist.entropy().mean() -> 3. 鼓励更确定的输出。 这让我有些困惑,是否熵项的符号反了?
The text was updated successfully, but these errors were encountered:
@johnjim0816
Sorry, something went wrong.
没有反,actor的损失本来就是负的,这里为了跟value一起能够用梯度下降的方法取反了
No branches or pull requests
PPO代码实现中,
计算actor_loss额外增加了熵正则化项,我的理解是为了鼓励新策略进行更多的探索,也就是增加熵值,
但是代码中的实现似乎不一样。如下
# compute actor loss
actor_loss = -torch.min(surr1, surr2).mean() + self.entropy_coef * dist.entropy().mean()
可预料的结果是:1. 为了最小化actor_loss -> 2. 最小化熵项self.entropy_coef * dist.entropy().mean() -> 3. 鼓励更确定的输出。
这让我有些困惑,是否熵项的符号反了?
The text was updated successfully, but these errors were encountered: