Can't reproduce DDPPO trained RL policy #517

jiaming-ai · 2024-09-17T02:08:49Z

I tried to train a RL object navigation policy following the instructions. The only thing I changed is the camera configurations (hfov, height etc) so that it matches with the camera on our robot. Note: I didn't change the reward functions, learning rate, nn architectures etc.

I train the policy on 5 GPU (each running 18 envs).

However, the agent seems not able to learn anything: it only learns to take the STOP action in order to avoid collision penalties. Please check the attach screenshot of tensorboard for details.

I tried several times (training from scratch) but none of the trails succeeded.

I'm wondering if there's any tricks the homerobot team used to make it work?
Any help here is appreciated! @yvsriram @cpaxton

yvsriram · 2024-10-02T22:48:31Z

Hey, we actually add the collision penalties and segmentation noise in second stage of training. You can find the first stage configs here: facebookresearch/habitat-lab@8037741

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce DDPPO trained RL policy #517

Can't reproduce DDPPO trained RL policy #517

jiaming-ai commented Sep 17, 2024 •

edited

Loading

yvsriram commented Oct 2, 2024

Can't reproduce DDPPO trained RL policy #517

Can't reproduce DDPPO trained RL policy #517

Comments

jiaming-ai commented Sep 17, 2024 • edited Loading

yvsriram commented Oct 2, 2024

jiaming-ai commented Sep 17, 2024 •

edited

Loading