October 2019
Intermediate to advanced
366 pages
12h 4m
English
We applied AC to LunarLander-v2, the same environment used for testing REINFORCE. It is an episodic game, and as such, it doesn't fully emphasize the main qualities of the AC algorithm. Nonetheless, it provides a good testbed, and you can freely test it in another environment.
We call the AC function with the following hyperparameters:
AC('LunarLander-v2', hidden_sizes=[64], ac_lr=4e-3, cr_lr=1.5e-2, gamma=0.99, steps_per_epoch=100, num_epochs=8000)
The resulting plot that shows the total reward accumulated in the training epochs is as follows:

You can see that AC is faster than REINFORCE, as shown in the following ...
Read now
Unlock full access