Chapter 5. Tackling more complex problems with actor-critic methods

This chapter covers

  • The limitations of the REINFORCE algorithm
  • Introducing a critic to improve sample efficiency and decrease variance
  • Using the advantage function to speed up convergence
  • Speeding up the model by parallelizing training

In the previous chapter we introduced a vanilla version of a policy gradient method called REINFORCE. This algorithm worked fine for the simple CartPole example, but we want to be able to apply reinforcement learning to more complex environments. You already saw that deep Q-networks can be quite effective when the action space is discrete, but it has the drawback of needing a separate policy function such as epsilon-greedy. In this chapter you’ll ...

Get Deep Reinforcement Learning in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.