October 2019
Intermediate to advanced
366 pages
12h 4m
English
In the paper, a version of ES is used that maximizes the average objective value, as follows:

It does this by searching over a population,
, that's parameterized by
with stochastic gradient ascent.
is the objective function (or fitness function) while is the parameters of the actor. In our problems, is simply the stochastic return that's ...
Read now
Unlock full access