Algorithm 7.2
Policy Iteration Algorithm
(1) Initialize images arbitrarily for all images.
(2) Policy evaluation:
images(3) Policy improvement: for each images, let
images(4) If π′ = π, stop. Otherwise set π=π′ and go to step (2).

7.4.2.2 Value Iteration

During each iteration of the policy iteration algorithm 7.2, there is a policy evaluation step (step (2)). In step (3) of the policy iteration algorithm, we update to a new policy π′, and then in step (2) of the next iteration, we need to compute the new value function Vπ. Although simple enough, computing Vπ itself may take time. Alternatively, as noted earlier, we may use iterative policy evaluation (7.48) to evaluate Vπ, which may still take time for convergence.

A simple modification to avoid this need for policy evaluation in the policy iteration algorithm 7.2 is to instead interweave iterative policy evaluation (7.48) with the policy improvement step (step (3)) in the policy iteration algorithm. In other words, in step (2) of each iteration of the policy ...

Get Signal Processing for Cognitive Radios now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.