© Nimish Sanghi 2021
N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/978-1-4842-6809-4_8

8. Combining Policy Gradient and Q-Learning

Nimish Sanghi1  
(1)
Bangalore, India
 

So far in this book, in the context of deep learning combined with reinforcement learning, we have looked at deep Q-learning with its variants in Chapter 6 and at policy gradients in Chapter 7. Neural network training requires multiple iterations, and Q-learning, an off-policy approach, enables us to use transitions multiple times, giving us sample efficiency. However, Q-learning can be unstable at times. Further, it is an indirect way of learning. Instead of learning an optimal policy directly, we first learn q-values and then use these action values to learn ...

Get Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.