October 2019
Intermediate to advanced
366 pages
12h 4m
English
The action-value function that uses one-step bootstrapping is defined as follows:

Here,
is the notorious next state.
Thus, with an
actor, and a
critic using bootstrapping, we obtain a one-step AC step:
This will replace the REINFORCE step with a baseline:
Note the difference between the use of the state-value ...
Read now
Unlock full access