Implementing ME-TRPO

The code of ME-TRPO is quite long and, in this section, we won't give you the full code. Also, many parts are not interesting, and all the code concerning TRPO has already been discussed in Chapter 7, TRPO and PPO Implementation. However, if you are interested in the complete implementation, or if you want to play with the algorithm, the full code is available in the GitHub repository of this chapter.

Here, we'll provide an explanation and the implementation of the following:

  • The inner cycle, where the games are simulated and the policy is optimized
  • The function that trains the models

The remaining code is very similar to that of TRPO.

The following steps will guide us through the process of building and implementing ...

Get Reinforcement Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.