October 2019
Intermediate to advanced
366 pages
12h 4m
English
The algorithm is complete however, the most interesting part has yet to be explained. In this section, we'll apply REINFORCE to LunarLander-v2, an episodic Gym environment with the aim of landing a lunar lander.
The following is a screenshot of the game in its initial position, and a hypothetical successful final position:

This is a discrete problem, and the lander has to land at coordinates (0,0), with a penalty if it lands far from that point. The lander has a positive reward when it moves from the top of the screen to the bottom, but when it fires the engine to slow down, it loses 0.3 points on each ...
Read now
Unlock full access