Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanabi obl pytorch #63

Merged
merged 6 commits into from
Mar 5, 2024
Merged

Hanabi obl pytorch #63

merged 6 commits into from
Mar 5, 2024

Conversation

ravihammond
Copy link
Collaborator

Pytorch obl now works

@mttga mttga merged commit aae9f99 into hanabi_obl_aligned Mar 5, 2024
6 checks passed
@mttga mttga deleted the hanabi_obl_pytorch branch March 5, 2024 13:36
@ravihammond ravihammond restored the hanabi_obl_pytorch branch March 15, 2024 16:51
@hnekoeiq
Copy link

Hi @ravihammond and @mttga,

Thanks for the great repo!
I was wondering if you have been able to reproduce the Hanabi IQL/VDN results? I just tried with the config file qlearn_hanabi.yaml (python baselines/QLearning/iql.py +alg=qlearn_hanabi +env=hanabi)
and the following is the agent's performance after almost 200 million timesteps:
image

Looking at the original Hanabi paper, 100 million steps should be enough to reach around 20. I'd appreciate it if you share your thoughts.

@mttga
Copy link
Collaborator

mttga commented Mar 15, 2024

Hi @hnekoeiq, no we are still working on that. The implementations we have of IQL-VDN are baselines for simple environments, while the original c++ ones are much more sophisticated and use many tricks. Meanwhile you can use IPPO which is fast and converges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants