Monte Carlo Tree Search is a powerful algorithm in unsupervised reinforcement learning, where we want to find optimal strategies for strategic games without specifying beforehand how a strong player would look like. Instead, the algorithm only learns from its own experiments which game states are preferable and finds an optimal balance between exploring new strategies and utilizing its current knowledge. This method is the basis of Google's AlphaGo, which was able to beat the best human Go player. The self-invented game “Tupitu”, which is described in detail in the next section, is chosen here to illustrate the algorithm.
The game Tupitu is played on a 5x5 grid with the two players A and B positioned as shown in the figure. The players take turns, with player A starting. In the first half of the turn, the player must move to one of the up to 8 adjacent pieces, just like the king in chess. In the second half of the turn, he chooses one of the tiles on which no player is standing. This tile is then cleared for the rest of the game, meaning that no player can move on it anymore. The first player who can no longer move on his turn loses.
Using Knuth's algorithm for deciding two-player games with perfect information, I also have implemented a perfect solver for this game here. It can prove in around 24 hours of computation that the starting player A can always force a win with optimal play.
For maximal efficiency, all game logic is implemented using pure bitwise operations. With MCTSvsRandom
you can simulate games between the MCTS player and a player making random moves, where one observes a practically 100% win rate for the MCTS player. You can also have two MCTS players play against each other by using MCTSvsMCTS
.
(c) Mia Muessig