Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reproduce DDPPO trained RL policy #517

Open
jiaming-ai opened this issue Sep 17, 2024 · 1 comment
Open

Can't reproduce DDPPO trained RL policy #517

jiaming-ai opened this issue Sep 17, 2024 · 1 comment

Comments

@jiaming-ai
Copy link

jiaming-ai commented Sep 17, 2024

I tried to train a RL object navigation policy following the instructions. The only thing I changed is the camera configurations (hfov, height etc) so that it matches with the camera on our robot. Note: I didn't change the reward functions, learning rate, nn architectures etc.

I train the policy on 5 GPU (each running 18 envs).

However, the agent seems not able to learn anything: it only learns to take the STOP action in order to avoid collision penalties. Please check the attach screenshot of tensorboard for details.

I tried several times (training from scratch) but none of the trails succeeded.

image
image

I'm wondering if there's any tricks the homerobot team used to make it work?
Any help here is appreciated! @yvsriram @cpaxton

@yvsriram
Copy link
Contributor

yvsriram commented Oct 2, 2024

Hey, we actually add the collision penalties and segmentation noise in second stage of training. You can find the first stage configs here: facebookresearch/habitat-lab@8037741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants