© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2024
N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_8

8. Policy Gradient Algorithms

Nimish Sanghi1  
(1)
Bangalore, India
 

Up to now, the book has focused on model-based and model-free methods. All the algorithms using these methods estimate the state or state-action values for a given current policy as the first step. In the second step, these estimated values are used to find a better policy by choosing the best action in a given state. These two steps are carried out in a loop until no further improvement in values is observed. In this chapter, you look at a different approach for learning optimal policies, by directly ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.