O'Reilly logo

Deep Reinforcement Learning Hands-On by Maxim Lapan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The AlphaGo Zero method

Overview

At a high level, the method consists of three components, all of which will be explained in detail later, so don't worry if something is not completely clear from this section:

  • We traverse constantly the game tree, using the Monte-Carlo Tree Search (MCTS) algorithm, the core idea of which is to semi-randomly walk down the game states, expanding them and gathering statistics about the frequency of moves and underlying game outcomes. As the game tree is huge, both in terms of the depth and width, we're not trying to build the full tree, just randomly sampling the most promising paths of it (that's the source of the method's name).
  • At every moment, we have a best player, which is the model used to generate the data via ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required