Algorithm 7.2
Policy Iteration Algorithm
(1) Initialize images arbitrarily for all images.
(2) Policy evaluation:
images(3) Policy improvement: for each images, let
images(4) If π′ = π, stop. Otherwise set π=π′ and go to step (2).

7.4.2.2 Value Iteration

During each iteration of the policy iteration algorithm 7.2, there is a policy evaluation step (step (2)). In step (3) of the policy iteration algorithm, we update to a new policy π′, and then in step (2) of the next iteration, we need to compute the new value function Vπ. Although simple enough, computing Vπ itself may take time. Alternatively, as noted earlier, we may use iterative policy evaluation (7.48) to evaluate Vπ, which may still take time for convergence.

A simple modification to avoid this need for policy evaluation in the policy iteration algorithm 7.2 is to instead interweave iterative policy evaluation (7.48) with the policy improvement step (step (3)) in the policy iteration algorithm. In other words, in step (2) of each iteration of the policy ...

Get Signal Processing for Cognitive Radios now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.