January 2020
Intermediate to advanced
432 pages
10h 18m
English
The fundamental problem we need to address with policy methods is the conversion to a natural gradient form of gradient ascent. Previously, we handled conjugating this gradient by simply applying the log function. However, this does not yield a natural gradient. Natural gradients are not susceptible to model parameterization and provide an invariant method to compute stable gradients. Let's look at how this is done in code by opening up our IDE to the TRPO example again and following the next exercise:
Read now
Unlock full access