Learning DDPG, TD3, and SAC
In the previous chapter, we learned about interesting actor-critic methods, such as Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C). In this chapter, we will learn several state-of-the-art actor-critic methods. We will start off the chapter by understanding one of the popular actor-critic methods called Deep Deterministic Policy Gradient (DDPG). DDPG is used only in continuous environments, that is, environments with a continuous action space. We will understand what DDPG is and how it works in detail. We will also learn the DDPG algorithm step by step.
Going forward, we will learn about the Twin Delayed Deep Deterministic Policy Gradient (TD3). TD3 is an improvement over the DDPG algorithm ...