Q-learning in action

A game may have in the region of 16-60 frames per second, and often rewards will be received based on actions taken many seconds ago. Also, the state space is vast. In computer games, the state contains all the pixels on the screen used as input to the game. If we imagine a screen downsampled to say 80 x 80 pixels, all of which are single color and binary, black or white, that is still a 2^6400 state. This makes a direct map from state to reward impractical.

What we will need to do is learn an approximation of the Q-function. This is where neural networks can be used for their universal function approximation ability. To train our Q-function approximation, we will store all the game states, rewards, and actions our agent took ...

Get Python Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.