October 2019
Intermediate to advanced
366 pages
12h 4m
English
The novelty of the NPG algorithm is in how it updates the parameters with a step update that combines the first and second derivatives. To understand the natural policy gradient step, we have to explain two key concepts: the Fisher Information Matrix (FIM) and the Kullback-Leibler (KL) divergence. But before explaining these two key concepts, let's look at the formula behind the update:
(7.1)
This update differentiates from the vanilla policy gradient, but only by the term
, which is used to enhance the gradient term.
In this formula, ...
Read now
Unlock full access