October 2019
Intermediate to advanced
366 pages
12h 4m
English
Trust region policy optimization (TRPO) is the first successful algorithm that makes use of several approximations to compute the natural gradient with the goal of training a deep neural network policy in a more controlled and stable way. From NPG, we saw that it isn't possible to compute the inverse of the FIM for nonlinear functions with a lot of parameters. TRPO overcomes these difficulties by building on top of NPG. It does this by introducing a surrogate objective function and making a series of approximations, which means it succeeds in learning about complex policies for walking, hopping, or playing Atari games from raw pixels.
TRPO is one of the most complex model-free algorithms and though we already ...
Read now
Unlock full access