September 2018
Intermediate to advanced
296 pages
9h 10m
English
Let's see the mechanism behind TRPO. If you feel that this part is hard to understand, you can skip it and go directly to how to run TRPO to solve MuJoCo control tasks. Consider an infinite-horizon discounted Markov decision process denoted by
, where
is a finite set of states,
is a finite set of actions,
is the transition probability ...
Read now
Unlock full access