April 2018
Intermediate to advanced
334 pages
10h 18m
English
Value iteration in POMDPs is basically the value iteration on an infinite state space obtained from a belief MDP.
At t=0, 
At t>0,
, where b' is b'(s') = p(s'|b,a,z), that is, the state estimation for (b,a,z), and R(b,a) is the expected reward over a belief state as shown here:

where,
p(s) = probability of the s state
R(s,a) = reward in that state
= expected reward over a belief state