Learning DDPG, TD3, and SAC

In the previous chapter, we learned about interesting actor-critic methods, such as Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C). In this chapter, we will learn several state-of-the-art actor-critic methods. We will start off the chapter by understanding one of the popular actor-critic methods called Deep Deterministic Policy Gradient (DDPG). DDPG is used only in continuous environments, that is, environments with a continuous action space. We will understand what DDPG is and how it works in detail. We will also learn the DDPG algorithm step by step.

Going forward, we will learn about the Twin Delayed Deep Deterministic Policy Gradient (TD3). TD3 is an improvement over the DDPG algorithm ...

Get Deep Reinforcement Learning with Python - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.