Algorithm 7.2
Policy Iteration Algorithm
(1) Initialize arbitrarily for all .
(2) Policy evaluation: (3) Policy improvement: for each , let (4) If π′ = π, stop. Otherwise set π=π′ and go to step (2).
7.4.2.2 Value Iteration
During each iteration of the policy iteration algorithm 7.2, there is a policy evaluation step (step (2)). In step (3) of the policy iteration algorithm, we update to a new policy π′, and then in step (2) of the next iteration, we need to compute the new value function Vπ′. Although simple enough, computing Vπ′ itself may take time. Alternatively, as noted earlier, we may use iterative policy evaluation (7.48) to evaluate Vπ′, which may still take time for convergence.
A simple modification to avoid this need for policy evaluation in the policy iteration algorithm 7.2 is to instead interweave iterative policy evaluation (7.48) with the policy improvement step (step (3)) in the policy iteration algorithm. In other words, in step (2) of each iteration of the policy ...
Get Signal Processing for Cognitive Radios now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.