Skip to Content
Python Deep Learning
book

Python Deep Learning

by Valentino Zocca, Gianmario Spacagna, Daniel Slater, Peter Roelants
April 2017
Intermediate to advanced
406 pages
10h 15m
English
Packt Publishing
Content preview from Python Deep Learning

Policy gradients in AlphaGo

For AlphaGo using policy gradients, the network was set up to play games against itself. It did so with a reward of 0 for every time step until the final one where the game is either won or lost, giving a reward of 1 or -1. This final reward is then applied to every time step in the network, and the network is trained using policy gradients in the same way as our Tic-tac-toe example. To prevent overfitting, games were played against a randomly selected previous version of the network. If the network constantly plays against itself, the risk is it could end up with some very niche strategies, which would not work against varied opponents, a local minima of sorts.

Building the initial supervised learning network that predicted ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Deep Learning - Second Edition

Python Deep Learning - Second Edition

Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants, Valentino Zocca
Python Deep Learning Projects

Python Deep Learning Projects

Matthew Lamons, Rahul Kumar, Abhishek Nagaraja

Publisher Resources

ISBN: 9781786464453Supplemental Content