Deep Reinforcement Learning in Action

Chapter 5. Tackling more complex problems with actor-critic methods

This chapter covers

The limitations of the REINFORCE algorithm
Introducing a critic to improve sample efficiency and decrease variance
Using the advantage function to speed up convergence
Speeding up the model by parallelizing training

In the previous chapter we introduced a vanilla version of a policy gradient method called REINFORCE. This algorithm worked fine for the simple CartPole example, but we want to be able to apply reinforcement learning to more complex environments. You already saw that deep Q-networks can be quite effective when the action space is discrete, but it has the drawback of needing a separate policy function such as epsilon-greedy. In this chapter you’ll ...

Get Deep Reinforcement Learning in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning in Action by Brandon Brown, Alexander Zai

Chapter 5. Tackling more complex problems with actor-critic methods

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly