N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_8

8. Policy Gradient Algorithms

Nimish Sanghi¹

(1)

Bangalore, India

Up to now, the book has focused on model-based and model-free methods. All the algorithms using these methods estimate the state or state-action values for a given current policy as the first step. In the second step, these estimated values are used to find a better policy by choosing the best action in a given state. These two steps are carried out in a loop until no further improvement in values is observed. In this chapter, you look at a different approach for learning optimal policies, by directly ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models by Nimish Sanghi

8. Policy Gradient Algorithms

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly