11

Policy Gradients – an Alternative

In this first chapter of part three of the book, we will consider an alternative way to handle Markov decision process (MDP) problems, which forms a full family of methods called policy gradient methods.

In this chapter, we will:

  • Cover an overview of the methods, their motivations, and their strengths and weaknesses in comparison to the already familiar Q-learning
  • Start with a simple policy gradient method called REINFORCE and try to apply it to our CartPole environment, comparing this with the deep Q-network (DQN) approach

Values and policy

Before we start talking about policy gradients, let's refresh our minds with the common characteristics of the methods covered in part two of this book. The central ...

Get Deep Reinforcement Learning Hands-On - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.