Get full access to Hands-On Reinforcement Learning for Games and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Introducing REINFORCE

The first algorithm we will look at is known as REINFORCE. It introduces the concept of PG in a very elegant manner, especially in PyTorch, which masks many of the mathematical complexities of this implementation. REINFORCE also works by solving the optimization problem in reverse. That is, instead of using gradient ascent, it reverses the mathematics so we can express the problem as a loss function and hence use gradient descent. The update equation now transforms to the following:

Here, we now assume the following:

This is the advantage over the baseline expressed by ; we will get to the advantage function in more ...

Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now