Actor-critic methods

Approaches to reinforcement learning can be divided into three broad categories:

  • Value-based learning: This tries to learn the expected reward/value for being in a state. The desirability of getting into different states can then be evaluated based on their relative value. Q-learning in an example of value-based learning.
  • Policy-based learning: In this, no attempt is made to evaluate the state, but different control policies are tried out and evaluated based on the actual reward from the environment. Policy gradients are an example of that.
  • Model-based learning: In this approach, which will be discussed in more detail later in the chapter, the agent attempts to model the behavior of the environment and choose an action based ...

Get Python Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.