MCTS Notes

Hendrik's notes

4 Steps

Selection - You decide what actions do you want to take from a particular state.

Expand - Once you're at a place where you don't know which action to take, you're gonna expand the tree.

Simulate - Then you're gonna run simulations on that expansion.

Back up - You back up your new knowledge up the tree.

When you run out of time you use the best knowledge to have to make the decision.

Aayush's notes

These policies are based on heuristics employed in the 6,6 version of Kalah. Ideas taken from - https://blog.hackerrank.com/mancala/.

Move that gets most points in the current full turn. Break ties using move that left the most marbles on your side.
Overflow - how many stones are passed to the other side when own store is passed. Less overflow, the better.
Number of further back pieces.
Score - pieces in each cell
Extra - number of pieces captured that move.
3,4,5 combined into - scores[x]-(overflows[x]*0.3)**2+extra - we could probably treat this as a linear regression problem?
(my_mancala – your_mancala) + (marbles_on_my_side – marbles_on_your_side) - heuristic evaluation for deep search. Made use of alpha beta pruning.

The following ideas are taken from:

Journal entries