October 2019
Intermediate to advanced
366 pages
12h 4m
English
An initial problem is encountered when looking at equation (6.2), because, in its formulation, the gradient of the objective function depends on the distribution of the states of a policy; that is:

We would use a stochastic approximation of that expectation, but to compute the distribution of the states,
, we still need a complete model of the environment. Thus, this formulation isn't suitable for our purposes.
The policy gradient theorem comes to the rescue here. Its purpose is to provide an analytical formulation to ...
Read now
Unlock full access