10
Policy Gradient Methods
In this chapter, we're going to introduce algorithms that directly optimize the policy network in reinforcement learning. These algorithms are collectively referred to as policy gradient methods. Since the policy network is directly optimized during training, the policy gradient methods belong to the family of on-policy reinforcement learning algorithms. Like value-based methods, which we discussed in Chapter 9, Deep Reinforcement Learning, policy gradient methods can also be implemented as deep reinforcement learning algorithms.
A fundamental motivation in studying the policy gradient methods is addressing the limitations of Q-learning. We'll recall that Q-learning is about selecting the action that maximizes the ...
Get Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.