October 2018
Intermediate to advanced
368 pages
9h 20m
English
To illustrate DQN, the CartPole-v0 environment of the OpenAI Gym is used. CartPole-v0 is a pole balancing problem. The goal is to keep the pole from falling over. The environment is 2D. The action space is made of two discrete actions (left and right movements). However, the state space is continuous and is made of four variables:
The CartPole-v0 is shown in Figure 9.6.1.
Initially, the pole is upright. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole exceeds 15 degrees from the vertical or 2.4 units from the center. The CartPole-v0 problem is considered solved if the average reward is 195.0 in 100 consecutive trials: ...