June 2018
Intermediate to advanced
546 pages
13h 30m
English
This section is optional and included for readers who are interested in why the method works. If you wish, you can refer to the original paper on cross-entropy, which will be given at the end of the section.
The basis of the cross-entropy method lies in the importance sampling theorem, which states this:
In our RL case, H(x) is a reward value obtained by some policy x and p(x) is a distribution of all possible policies. We don't want to maximize our reward by searching all possible policies, instead we want to find a way to approximate p(x)H(x) by q(x), iteratively minimizing the distance between ...
Read now
Unlock full access