Deep Learning and the Game of Go

Chapter 10. Reinforcement learning with policy gradients

This chapter covers

Improving game play with policy gradient learning
Implementing policy gradient learning in Keras
Tuning optimizers for policy gradient learning

Chapter 9 showed you how to make a Go-playing program play against itself and save the results in experience data. That’s the first half of reinforcement learning; the next step is to use experience data to improve the agent so that it wins more often. The agent from the previous chapter used a neural network to select which move to play. As a thought experiment, imagine you shift every weight in the network by a random amount. Then the agent will select different moves. Just by luck, some of those new moves will be better ...

Get Deep Learning and the Game of Go now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Learning and the Game of Go by Kevin Ferguson, Max Pumperla

Chapter 10. Reinforcement learning with policy gradients

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly