January 2019
Intermediate to advanced
386 pages
11h 13m
English
Once we're able to evaluate a policy, let's look at how to improve it. This task is also known as control. We'll assume that the policy is represented as a table, where the best actions are stored for each state (tabular solution). We'll also assume that we have an already-existing value function,
(the step described in the preceding section), and a policy, π. For each state, s, we'll do the following: