April 2018
Intermediate to advanced
334 pages
10h 18m
English
The process of obtaining optimal utility by iterating over the policy and updating the policy itself instead of value until the policy converges to the optimum is called policy iteration. The process of policy iteration is as follows:

policy at iteration step t, calculate
by using the following formula: