Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRPO或者PPO程序疑问 #96

Open
lz200202 opened this issue Nov 16, 2024 · 0 comments
Open

TRPO或者PPO程序疑问 #96

lz200202 opened this issue Nov 16, 2024 · 0 comments

Comments

@lz200202
Copy link

image
old_log_probs计算完成后立即开始计算log_probs,它们之间并没有进行策略梯度参数更新的操作,为什么他们不相等?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant