Selection

The first step of MCTS involves playing the game intelligently. That means the algorithm has enough experience to determine the next move given a state. One method for determining the next move is called Upper Confidence Bound 1 Applied to Trees (UCT). In short, this formula rates moves based on the following:

  • The mean reward of games where a given move was made
  • How often the move was selected

Each node's rating can be expressed as follows:

Where:

  • : Is the mean reward for choosing move  (for example, the win-rate)
  • : Is the number ...

Get Python Reinforcement Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.