N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_9

9. Combining Policy Gradient and Q-Learning

Nimish Sanghi¹

(1)

Bangalore, India

So far in this book, in the context of deep learning combined with reinforcement learning, Chapters 6 and 7 explained deep Q-learning with its variants. You looked at policy gradients in Chapter 8. Neural network training requires multiple iterations, and Q-learning, an off-policy approach, enables you to reuse sample transitions multiple times, giving you sample efficiency. However, Q-learning can be unstable at times. Further, it is an indirect way of learning. Instead of learning an ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models by Nimish Sanghi

9. Combining Policy Gradient and Q-Learning

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly