Skip to content

Latest commit

 

History

History
24 lines (18 loc) · 1.37 KB

README.md

File metadata and controls

24 lines (18 loc) · 1.37 KB

Loss of plasticity in reinforcement learning

This directory contains the code to demonstrate and mitigate loss of plasticity in reinforcement learning problems from the OpenAI Gym. The actor and critic networks are specified in ../net/policies.py and ../net/valuesf.py respectively.

The configurations for individual experiments can be found in cfg. cfg/ant/std.yml specifies the parameters for standard PPO in the Ant-v3 environment. The following command can be used to perform one run for this configuration file. The -s parameter specifies the random seed for the experiment. A single run (for 50M time-steps) on a normal laptop takes about 24 CPU-hours.

python3.8 run_ppo.py -c cfg/ant/std.json -s 0

Configuration files in cfg/ant/ns.yml, cfg/ant/l2.yml, and cfg/ant/cbp.ymlspecify the parameters for PPO with proper Adam, PPO with L2 regularization, and PPO with continual backpropagation respectively.

After completing 30 runs for the four configuration files specified above, the commands below can be used to plot the left figure below. The generated figures will be in the plots directory.

cd plots/
python3.8 fig4a.py