Actor-Critic algorithms
Actor-Critic (AC) algorithms are on-policy policy gradient algorithms that also learn a value function (generally a Q-function) called a critic to provide feedback to the policy, the actor. Imagine that you, the actor, want to go to the supermarket via a new route. Unfortunately, before arriving at the destination, your boss calls you requiring you to go back to work. Because you didn't reach the supermarket, you don't know if the new road is actually faster than the old one. But if you reached a familiar location, you can estimate the time you'll need to go from there to the supermarket and calculate whether the new path is preferable. This estimate is what the critic does. In this way, you can improve the actor even ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access