Signal Processing for Cognitive Radios

Algorithm 7.2

Policy Iteration Algorithm
(1) Initialize  arbitrarily for all .
(2) Policy evaluation:
(3) Policy improvement: for each , let
(4) If π′ = π, stop. Otherwise set π=π′ and go to step (2).

7.4.2.2 Value Iteration

During each iteration of the policy iteration algorithm 7.2, there is a policy evaluation step (step (2)). In step (3) of the policy iteration algorithm, we update to a new policy π′, and then in step (2) of the next iteration, we need to compute the new value function V^π′. Although simple enough, computing V^π′ itself may take time. Alternatively, as noted earlier, we may use iterative policy evaluation (7.48) to evaluate V^π′, which may still take time for convergence.

A simple modification to avoid this need for policy evaluation in the policy iteration algorithm 7.2 is to instead interweave iterative policy evaluation (7.48) with the policy improvement step (step (3)) in the policy iteration algorithm. In other words, in step (2) of each iteration of the policy ...

Get Signal Processing for Cognitive Radios now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Signal Processing for Cognitive Radios by

Algorithm 7.2

7.4.2.2 Value Iteration

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly