Skip to Content
Reinforcement Learning with TensorFlow
book

Reinforcement Learning with TensorFlow

by Sayon Dutta
April 2018
Intermediate to advanced content levelIntermediate to advanced
334 pages
10h 18m
English
Packt Publishing
Content preview from Reinforcement Learning with TensorFlow

Trust region policy optimization

Trust region policy optimization (TRPO) is an iterative approach for optimizing policies. TRPO optimizes large nonlinear policies. TRPO restricts the policy search space by applying constraints on the output policy distributions. In order to do this, KL divergence loss function () is used on the policy network parameters to penalize these parameters. This KL divergence constraint between the new and the old policy is called the trust region constraint. As a result of this constraint large scale changes don't occur in the policy distribution, thereby resulting in early convergence of the policy network.

TRPO ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Deep Learning with TensorFlow - Second Edition

Deep Learning with TensorFlow - Second Edition

Giancarlo Zaccone, Vihan Jain, Md. Rezaul Karim, Motaz Saad
Deep Learning with TensorFlow 2 and Keras - Second Edition

Deep Learning with TensorFlow 2 and Keras - Second Edition

Antonio Gulli, Dr. Amita Kapoor, Sujit Pal

Publisher Resources

ISBN: 9781788835725Supplemental Content