Get full access to Reinforcement Learning Algorithms with Python and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Implementing ME-TRPO

The code of ME-TRPO is quite long and, in this section, we won't give you the full code. Also, many parts are not interesting, and all the code concerning TRPO has already been discussed in Chapter 7, TRPO and PPO Implementation. However, if you are interested in the complete implementation, or if you want to play with the algorithm, the full code is available in the GitHub repository of this chapter.

Here, we'll provide an explanation and the implementation of the following:

The inner cycle, where the games are simulated and the policy is optimized
The function that trains the models

The remaining code is very similar to that of TRPO.

The following steps will guide us through the process of building and implementing ...

Get Reinforcement Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now