October 2019
Intermediate to advanced
366 pages
12h 4m
English
PPO and TRPO are very similar algorithms and we choose to compare them by testing PPO in the same environment as TRPO, namely RoboschoolWalker2d. We devoted the same computational resources for tuning both of the algorithms so that we have a fairer comparison. The hyperparameters for TRPO are the same as those we listed in the previous section but instead, the hyperparameters of PPO are shown in the following table:
| Hyperparameter | Value |
| Neural network | 64, tanh, 64, tanh |
| Policy learning rate | 3e-4 |
| Number of actor iterations | 10 |
| Number of agents | 1 |
| Time horizon | 5,000 |
| Mini-batch size | 256 |
| Clipping coefficient | 0.2 |
| Delta (for GAE) | 0.95 |
| Gamma (for GAE) | 0.99 |
A comparison between PPO and TRPO is shown in the ...
Read now
Unlock full access