GitHub - JulesVerny/PongConvolutionalDQN: Example of DQN Convolutional Learning to play Pong

Pong Game Reinforcement DQN Learning

Deep DQN Based Reinforcement Learning for simple Pong PyGame. This python based RL Experiment plays a Py Pong Game (DQN control of Left Hand Paddle against a programmed RHS Paddle)

The Objective is simply measured as successfully returning of the Ball by the Left Paddle which is Trained and Controlled by a DQN Agent. The programmed opponent player is a pretty hot player. So success as is simply the ability to return ball served from Serena Williams. The Moving Average Score is calculated in the range from [-10, +10] from Complete failure to return the balls, to full success in returning the Ball. This experiment demonstrates DQN based Reinforcement Learning Agent, which improves from poor performance ~ -9.0 towards reasonably good performance +9.9 in around 40,000 epochs.

This is a Convolutional Network based RL implementation where it is based upon the Game Image state returned from the pyGame Game: ScreenImage = pygame.surfarray.array3d(pygame.display.get_surface()) This makes Learning very slow at about 40,000 epochs to Train (About 5 hours on a Tensor Flow enbaled GPU) The Best Weights are then stored in BestPongModelWeights.h5, for use in Subsequent Agent Play

This DQN code takes the 400x400 Screen image, and reduces it down to 40x40 greyscale image using skimage image processing, and stacks this up with previous 3 images into a 40x40x4 input into the Keras based Convolutional network. The 'successful' network compromises of 3 convolutional layers and two dense layers to make an estimate of Q, for Three Actions (Stay, Up, Down)

Erratic Long Term Training

Note I capture and abort the DQN Training as soon as I see the Training Game performance approach and stay around +10.0 for the First time. Regardless of any further Epsilon decay. I have noticed that keeping the Training going, with further epsilon decay will cause various erratic game declines and recovery growths. I cannot explain these erratic declines. So its good to keep a watch on Training Performance and not waste days expecting the ultimate performance.

Useage

python TrainAgent.py : To Train the Agent up to the point where good perfomance is observed
python PlayBestAgent.py : To Play the Trained Agent (By loading the BestPongModelWeights.h5)
python PlotProgress.py : To check the Game Score Growth during the long hours of Training

The Experiment is based upon the following files:

MyPong.py : The pygame based Pong Game based upon Siraj Raval's code
MyAgent.py : The Convolutional DQN based agent using Ben Laus Convolutional Flappy Bird DQN code as a source

Main Python Package Dependencies

pygame, keras [hence TensorFlow,Theano], numpy, matplotlib, skimage

Acknowledgments:

The Pong Game Code is based upon Siraj Raval's inspiring videos on Machine learning and Reinforcement Learning https://github.com/llSourcell/pong_neural_network_live
The DQN Agent Software is Based upon Ben Lau source code: https://github.com/yanpanlau/Keras-FlappyBird
Daniel Slaters Blog & Examples: http://www.danielslater.net/2016/03/deep-q-learning-pong-with-tensorflow.html?showComment=1502902115538
WILDML Reinforcement Learning Summary (Examples): http://www.wildml.com/2016/10/learning-reinforcement-learning/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
FinalPerfomance.png		FinalPerfomance.png
MyAgent.py		MyAgent.py
MyPong.py		MyPong.py
PlayBestAgent.py		PlayBestAgent.py
PlotProgress.py		PlotProgress.py
README.md		README.md
Scoreat250000.png		Scoreat250000.png
ScreenImage.PNG		ScreenImage.PNG
TrainAgent.py		TrainAgent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pong Game Reinforcement DQN Learning

Erratic Long Term Training

Useage

Main Python Package Dependencies

Acknowledgments:

About

Releases

Packages

Languages

JulesVerny/PongConvolutionalDQN

Folders and files

Latest commit

History

Repository files navigation

Pong Game Reinforcement DQN Learning

Erratic Long Term Training

Useage

Main Python Package Dependencies

Acknowledgments:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages