December 2018
Beginner to intermediate
684 pages
21h 9m
English
The preceding hyperparameter settings enable the agent to solve the environment within 290 episodes. The left panel of the following diagram shows episode rewards and their moving average over 100 periods. The left panel shows the decay of exploration and the number of steps per episodes. There is a stretch of some 100 episodes that often take 1,000 time steps while the agent reduces exploration and learns to fly before starting to land fairly consistently:
