Some of our group members are working on projects related to Reinforcement Learning. Below are some of the projects.
Andrej Karpathy's pong code (below) was the base for this. However, instead of pong, we are trying to get the gradient poilcy algorithm working tic-tac-toe (TTT). We are using the TTT simulation that is part of OpenAI's Gym/Universe miniwob platorm. Various changes are being made (such as using softmax at the last layer) to get TTT working.
Trains pong using Gradient Policies, from Andrej Karpathy's gibhub (https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5) with potentially slight changes due to Python version. The logic behind this work is explained on his wonderful blog post at http://karpathy.github.io/2016/05/31/rl/ .