The first step of MCTS involves playing the game intelligently. That means the algorithm has enough experience to determine the next move given a state. One method for determining the next move is called Upper Confidence Bound 1 Applied to Trees (UCT). In short, this formula rates moves based on the following:
- The mean reward of games where a given move was made
- How often the move was selected
Each node's rating can be expressed as follows:
- : Is the mean reward for choosing move (for example, the win-rate)
- : Is the number ...