Chapter 10. Reinforcement learning with policy gradients

This chapter covers

  • Improving game play with policy gradient learning
  • Implementing policy gradient learning in Keras
  • Tuning optimizers for policy gradient learning

Chapter 9 showed you how to make a Go-playing program play against itself and save the results in experience data. That’s the first half of reinforcement learning; the next step is to use experience data to improve the agent so that it wins more often. The agent from the previous chapter used a neural network to select which move to play. As a thought experiment, imagine you shift every weight in the network by a random amount. Then the agent will select different moves. Just by luck, some of those new moves will be better ...

Get Deep Learning and the Game of Go now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.