Skip to Main Content
Hands-On Reinforcement Learning with Python
book

Hands-On Reinforcement Learning with Python

by Sudharsan Ravichandiran
June 2018
Intermediate to advanced content levelIntermediate to advanced
318 pages
9h 24m
English
Packt Publishing
Content preview from Hands-On Reinforcement Learning with Python

Trust Region Policy Optimization

Before understanding Trust Region Policy Optimization (TRPO), we need to understand constrained policy optimization. We know that in RL agents learn by trial and error to maximize the reward. To find the best policy, our agents will explore all different actions and choose the one that gives a good reward. While exploring different actions there is a very good chance that our agents will explore bad actions as well. But the biggest challenge is when we allow our agents to learn in the real world and when the reward functions are not properly designed. For example, consider an agent learning to walk without hitting any obstacles. The agent will receive a negative reward if it gets hit by any obstacle and a ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Deep Learning with Python

Advanced Deep Learning with Python

Ivan Vasilev

Publisher Resources

ISBN: 9781788836524Supplemental Content