January 2019
Intermediate to advanced
386 pages
11h 13m
English
In this section, we'll put together everything we've learned so far and we'll combine policy evaluation and improvement in a single algorithm (so exciting!). Fortunately, the concepts are simple.
We'll start with policy iteration. It refers to alternating steps of policy evaluation and policy improvement until the process converges. Here is a sample diagram of the policy iteration steps:

Policy iteration has one disadvantage: it performs evaluation in each iteration. Evaluation itself is an iterative process, which might be time-consuming. It turns out that we can improve its performance ...