Replies: 2 comments
-
I'm tagging Matteo which is our PoC for MARL things and the owner of the mappo_ippo script! |
Beta Was this translation helpful? Give feedback.
0 replies
-
It could be that the reward for not colliding is taking over and preventing the navigation success. The reward increasing is a good sign in general in case the reward function makes sense. But in general I am not able to make diagnostic comments about your custome environment, sorry. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've design my custom navigation env that have obstacles on it and the agents don't hit each other.
when I run my mappo_ippo.py I got strange outputs. can my model overfitted?
my custom_env is:
https://drive.google.com/file/d/1yw1rOpJcmoU99zcz-2wEGqV_ZnfOT1qF/view?usp=sharing
my config of mappo_ippo is:
max_steps:200
n_iters:625
n_agents and n_targets:3
backend:csv
entropy_eps:0.0001
remain confs is the same.
when I look at my csv and my videos they are surprising:
in my video the 20 first epochs they reach the goals very easy but after that episode they stop.
in my csv the train_mean_reward increasing non-stoply but my critic loss also increasing.
is this meaning my model overfitted?
@matteobettini
Beta Was this translation helpful? Give feedback.
All reactions