October 2019
Intermediate to advanced
366 pages
12h 4m
English
Policy iteration cycles between policy evaluation, which updates
under the current policy,
, using formula (8), and policy improvement (9), which computes
using the improved value function,
. Eventually, after
cycles, the algorithm ...
Read now
Unlock full access