Q-learning in action

In this section, we'll use Q-learning in combination with a simple neural network to control an agent in the cart-pole task. We'll use an ε-greedy policy and experience replay. This is a classic RL problem. The agent must balance a pole attached to the cart via a joint. At every step, the agent can move the cart left or right. It receives a reward of 1 every time step that the pole is balanced. If the pole deviates by more than 15 degrees from upright, the game ends:

The cart-pole task

To help us with this, we'll use OpenAI Gym (https://gym.openai.com/), which is an open source toolkit for the development and comparison ...

Get Python Deep Learning - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.