Theory behind TRPO

Let's see the mechanism behind TRPO. If you feel that this part is hard to understand, you can skip it and go directly to how to run TRPO to solve MuJoCo control tasks. Consider an infinite-horizon discounted Markov decision process denoted by , where  is a finite set of states,  is a finite set of actions,  is the transition probability ...

Get Python Reinforcement Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.