Q-learning in action

In this section, we'll use Q-learning in combination with a simple neural network to control an agent in the cart-pole task. We'll use an ε-greedy policy and experience replay. This is a classic RL problem. The agent must balance a pole attached to the cart via a joint. At every step, the agent can move the cart left or right. It receives a reward of 1 every time step that the pole is balanced. If the pole deviates by more than 15 degrees from upright, the game ends:

The cart-pole task

To help us with this, we'll use OpenAI Gym (https://gym.openai.com/), which is an open source toolkit for the development and comparison ...

Get Python Deep Learning - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.