O'Reilly logo

Python Reinforcement Learning Projects by Rajalingappaa Shanmugamani, Yang Wenzhuo, Sean Saito

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Combining neural networks and MCTS

In AlphaGo, the policy and value networks are combined with MCTS to provide a look-ahead search when selecting actions in a game. Previously, we discussed how MCTS keeps track of the mean reward and number of visits made to each node. In AlphaGo, we have a few more values to keep track of:

  • : Which is the mean action value of choosing a particular action
  • : The probability of taking an action for a given board state given by the larger supervised learning policy network
  • : The value evaluation of a state that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required