Deep Q-learning

The Q-learning method that we've just seen solves the issue with iteration over the full set of states, but still can struggle with situations when the count of the observable set of states is very large. For example, Atari games can have a large variety of different screens, so if we decide to use raw pixels as individual states, we'll quickly realize that we have too many states to track and approximate values for.

In some environments, the count of different observable states could be almost infinite. For example, in CartPole the state given to us by the environment is four floating point numbers. The number of combinations of values is finite (they're represented as bits), but this number is extremely large. We could create ...

Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.