Bayesian Q-Learning - Algorithm for multiagent reinforcement learning

Bayesian Q-Learning applied to a continuous version of the Prisoners Dilemma game. Agents converge to the Nash equilibrium solution (mutual defection).