Get full access to Hands-On Reinforcement Learning for Games and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

PPO and clipped objectives

Before we get into the finer details of how PPO works, we need to step back and understand how we equate the difference in distributed data distributions or just distributions. Remember that PG methods look to understand the returns-based sampling distribution and then use that to find the optimum action or the probability of the optimum action. Due to this, we can use a method called KL Divergence to determine how different the two distributions are. By understanding this, we can determine how much room or area of trust we can allow our optimization algorithm to explore with. PPO improves on this by clipping the objective function by using two policy networks.

Jonathan Hui has a number of insightful blog posts ...

Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now