11
Actor-Critic Methods – A2C and A3C
So far, we have covered two types of methods for learning the optimal policy. One is the value-based method, and the other is the policy-based method. In the value-based method, we use the Q function to extract the optimal policy. In the policy-based method, we compute the optimal policy without using the Q function.
In this chapter, we will learn about another interesting method called the actor-critic method for finding the optimal policy. The actor-critic method makes use of both the value-based and policy-based methods. We will begin the chapter by understanding what the actor-critic method is and how it makes use of value-based and policy-based methods. We will acquire a basic understanding of actor-critic ...
Get Deep Reinforcement Learning with Python - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.