Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO算法对应代码实现的熵正则化是否有误? #171

Closed
ZhangNy301 opened this issue Feb 17, 2025 · 2 comments
Closed

PPO算法对应代码实现的熵正则化是否有误? #171

ZhangNy301 opened this issue Feb 17, 2025 · 2 comments

Comments

@ZhangNy301
Copy link

PPO代码实现中,
计算actor_loss额外增加了熵正则化项,我的理解是为了鼓励新策略进行更多的探索,也就是增加熵值,
但是代码中的实现似乎不一样。如下
# compute actor loss
actor_loss = -torch.min(surr1, surr2).mean() + self.entropy_coef * dist.entropy().mean()
可预料的结果是:1. 为了最小化actor_loss -> 2. 最小化熵项self.entropy_coef * dist.entropy().mean() -> 3. 鼓励更确定的输出。
这让我有些困惑,是否熵项的符号反了?

@qiwang067
Copy link
Contributor

@johnjim0816

PPO代码实现中, 计算actor_loss额外增加了熵正则化项,我的理解是为了鼓励新策略进行更多的探索,也就是增加熵值, 但是代码中的实现似乎不一样。如下 # compute actor loss actor_loss = -torch.min(surr1, surr2).mean() + self.entropy_coef * dist.entropy().mean() 可预料的结果是:1. 为了最小化actor_loss -> 2. 最小化熵项self.entropy_coef * dist.entropy().mean() -> 3. 鼓励更确定的输出。 这让我有些困惑,是否熵项的符号反了?

@johnjim0816
Copy link
Contributor

PPO代码实现中, 计算actor_loss额外增加了熵正则化项,我的理解是为了鼓励新策略进行更多的探索,也就是增加熵值, 但是代码中的实现似乎不一样。如下 # compute actor loss actor_loss = -torch.min(surr1, surr2).mean() + self.entropy_coef * dist.entropy().mean() 可预料的结果是:1. 为了最小化actor_loss -> 2. 最小化熵项self.entropy_coef * dist.entropy().mean() -> 3. 鼓励更确定的输出。 这让我有些困惑,是否熵项的符号反了?

没有反,actor的损失本来就是负的,这里为了跟value一起能够用梯度下降的方法取反了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants