September 2018
Intermediate to advanced
296 pages
9h 10m
English
Finally, the update step happens when the algorithm reaches a terminal state, or when either player wins or the game culminates in a draw. For each node/state of the board that was visited during this iteration, the algorithm updates the mean reward and increments the visit count of that state. This is also called backpropagation:

In the preceding diagram, since we reached a terminal state that returned 1 (a win), we increment the visit count and reward accordingly for each node along the path from the root node accordingly.
That concludes the four steps that occur in one MCTS iteration. As the name Monte Carlo suggests, ...
Read now
Unlock full access