September 2018
Intermediate to advanced
296 pages
9h 10m
English
The trust region policy optimization (TRPO) algorithm was proposed to solve complex continuous control tasks in the following paper: Schulman, S. Levine, P. Moritz, M. Jordan and P. Abbeel. Trust Region Policy Optimization. In ICML, 2015.
To understand why TRPO works requires some mathematical background. The main idea is that it is better to guarantee that the new policy,
, optimized by one training step, not only monotonically decreases the optimization loss function (and thus improves the policy), but also does not deviate from the previous policy much, which means that there should be a constraint on the ...
Read now
Unlock full access