Bayesian Q-Learning applied to a continuous version of the Prisoners Dilemma game. Agents converge to the Nash equilibrium solution (mutual defection).
Bayesian Q-Learning applied to a continuous version of the Prisoners Dilemma game. Agents converge to the Nash equilibrium solution (mutual defection).