This repository hosts my (ongoing) implementations of RL algorithms. I am using
- gymnasium for the environments.
- pytorch for implementing the algorithm.
Implementations:
-
I have implemented Policy Gradient with Baseline in vpg.py. The weights for trained policy are stored in saved-models. You can run test.py to run the agent with the saved policy.
Results from the trained vpg model: rl-demo.webm
Avearage reward before training = 9.5 / 500
Avearage reward after training = 499.3 / 500