Building Machine Learning Systems with Python - Third Edition
by Luis Pedro Coelho, Willi Richert, Matthieu Brucher
Q-network
Actually, Deepmind started making a name for themselves before Go. They were using what is called a Q-network to solve Atari games. These are a set of simple games where the gamer can play only up to 10 moves at each stage.
With these networks, the goal is to estimate a long-term reward function (like the number of points) and which move will maximize it. By feeding in enough options at the beginning, the network will progressively learn how to play better and better. The reward function is the following:
Q(s,a) = r + γ(max(Q(s',a'))
r is the reward, γ is a discounting factor (future gains are not as important as the immediate reward), s is the current state of the game, and a is the action we could take.
Of course, as it is continuously ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access