11 Policy Gradients – an Alternative

In this first chapter of part three of the book, we will consider an alternative way to handle Markov decision process (MDP) problems, which forms a full family of methods called policy gradient methods.

In this chapter, we will:

Cover an overview of the methods, their motivations, and their strengths and weaknesses in comparison to the already familiar Q-learning
Start with a simple policy gradient method called REINFORCE and try to apply it to our CartPole environment, comparing this with the deep Q-network (DQN) approach

Values and policy

Before we start talking about policy gradients, let's refresh our minds with the common characteristics of the methods covered in part two of this book. The central ...

Get Deep Reinforcement Learning Hands-On - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning Hands-On - Second Edition by Maxim Lapan

11

Policy Gradients – an Alternative

Values and policy

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly