April 2018
Intermediate to advanced
334 pages
10h 18m
English
The preceding policy optimization using the Monte Carlo policy gradient approach leads to high variance. In order to tackle this issue, we use a critic to estimate the state-action value function, that is as follows:

This gives rise to the famous actor-critic algorithms. The actor-critic algorithm, as the name suggests, maintains two networks for the following purposes:
The following ...
Read now
Unlock full access