October 2019
Intermediate to advanced
366 pages
12h 4m
English
For a direct comparison of TD3 and DDPG, we tested TD3 in the same environment that we used for DDPG: BipedalWalker-v2.
The best hyperparameters for TD3 for this environment are listed in this table:
| Hyperparameter | Actor l.r. | Critic l.r. | DNN Architecture | Buffer Size | Batch Size | Tau |
Policy Update Freq
|
Sigma
|
| Value | 4e-4 | 4e-4 | [64,relu,64,relu] | 200000 | 64 | 0.005 | 2 | 0.2 |
The result is plotted in the following diagram. The curve has a smooth trend, and reaches good results after about 300K steps, with top peaks at 450K steps of training. It arrives very close to the goal of 300 points, but it does not actually gain them:
The time spent finding a good ...
Read now
Unlock full access