October 2019
Intermediate to advanced
366 pages
12h 4m
English
DDPG is regarded as one of the most sample-efficient actor-critic algorithms, but it has been demonstrated to be brittle and sensitive to hyperparameters. Further studies have tried to alleviate these problems, by introducing novel ideas, or by using tricks from other algorithms on top of DDPG. Recently, one algorithm has taken over as a replacement of DDPG: twin delayed deep deterministic policy gradient, or for short, TD3 (the paper is Addressing Function Approximation Error in Actor-Critic Methods: https://arxiv.org/pdf/1802.09477.pdf). We have used the word replacement here, because it's actually a continuation of the DDPG algorithms, with some more ingredients that make it more stable, ...
Read now
Unlock full access