KL div wrong way around? #5

cooijmanstim · 2023-03-01T01:43:03Z

Line 500 in 6b07e89

kl_div = (curr * (jnp.log(curr) - jnp.log(target))).sum(axis=-1).mean()

The KL divergence looks like it's the wrong way around. Typically you want the expectation to be under the target distribution. Certainly in the paper it's written as KL(old||new), which would be -old * (log(new) - log(old)). Not sure what the implications are if it is indeed the wrong way around.

The text was updated successfully, but these errors were encountered:

Silent-Zebra · 2023-03-02T05:52:22Z

Hi - thanks for the comment. I believe you are correct that there's an inconsistency with the paper and the code.

Regarding implications: my initial guess is that it shouldn't make too much of a difference. However, the hyperparameters may not work as is and may have to be tuned for that new change (which could take a lot of time for the best results). If I have time, I will rerun experiments and see if I can reproduce the results using the KL the other way, and if not, how big the difference is.

I found this paper "Revisiting Design Choices in Proximal Policy Optimization" that investigates forward vs reverse KL for PPO and seems to suggest there tends not to be a big difference in practice (but if there is, there appears to be a slight advantage to using the reverse KL, which is the one in my code and not my paper). If it turns out that the reverse KL is actually playing a big part in the performance of POLA-DiCE (ie, if even with extensive hyperparameter tuning, the results on coin game aren't as good using forward KL), then it might be an interesting follow-up research project to investigate exactly why that is (why the difference is magnified in the multi-agent setting compared to e.g. in the linked paper) and what the implications more broadly might be for multi-agent RL.

Silent-Zebra · 2023-07-03T21:28:02Z

Closing as fixes have been implemented and experiments rerun (see updated readme)

Silent-Zebra closed this as completed Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KL div wrong way around? #5

KL div wrong way around? #5

cooijmanstim commented Mar 1, 2023 •

edited

Loading

Silent-Zebra commented Mar 2, 2023

Silent-Zebra commented Jul 3, 2023

KL div wrong way around? #5

KL div wrong way around? #5

Comments

cooijmanstim commented Mar 1, 2023 • edited Loading

Silent-Zebra commented Mar 2, 2023

Silent-Zebra commented Jul 3, 2023

cooijmanstim commented Mar 1, 2023 •

edited

Loading