October 2019
Intermediate to advanced
366 pages
12h 4m
English
How well will the scalable version of evolution strategies perform in the LunarLander environment? Let's find out!
As you may recall, we already used LunarLander against A2C and REINFORCE in Chapter 6, Learning Stochastic and PG optimization. This task consists of landing a lander on the moon through continuous actions. We decided to use this environment for its medium difficulty and to compare the ES results to those that were obtained with A2C.
The hyperparameters that performed the best in this environment are as follows:
| Hyperparameter | Variable name | Value |
| Neural network size | hidden_sizes | [32, 32] |
| Training iterations (or generations) | number_iter | 200 |
| Worker's number | num_workers | 4 |
| Adam learning ... |
Read now
Unlock full access