October 2018
Intermediate to advanced
368 pages
9h 20m
English
The simplest policy gradient method is called REINFORCE [5], this is a Monte Carlo policy gradient method:
(Equation 10.2.1)
where Rt is the return as defined in Equation 9.1.2. Rt is an unbiased sample of
in the policy gradient theorem.
Algorithm 10.2.1 summarizes the REINFORCE algorithm [2]. REINFORCE is a Monte Carlo algorithm. It does not require knowledge of the dynamics of the environment (that is, model-free). Only experience samples, , are needed to optimally tune the parameters of the policy ...