October 2018
Beginner
362 pages
9h 32m
English
We've gone over how AlphaGo helped select actions, so now let's get back to the core of any game-playing system: the game tree. While AlphaGo utilized game trees and MCTS, the authors created a variation of it called asynchronous policy and value MCTS (APV-MCTS). Unlike standard MCTS, which we discussed previously, APV-MCTS decides which node to expand and evaluate by two different metrics:
The results of these methods are combined with mixing parameters, λ. The algorithm then chooses an action according to the probabilities that were obtained during the initial supervised learning phase. While it may seem counter intuitive to use the probabilities ...
Read now
Unlock full access