
398 Chapter 16 Planning Based on Markov Decision Processes
s1
s2
deliver to l2
s3
move to l1
move to l2
wait
1
0.5
0.5
0.5
0.5
wait
wait
1
1
1
full
l1
l1
empty
Figure 16.9 A stochastic system for continuous delivery.
the nondeterminism of actions move(r1,l2,l1) and wait that can lead to a state
with or without containers (full or empty) with uniform probability distribution.
We can express the goal described informally by assigning a high reward to state
s2. Given this planning problem, the planning algorithms generate the obvious
policy {(s1, deliver to l2), (s2, wait), (s3, move(r1, l2, l1)}. This policy continuously
delivers containers to location l2 as soon as they ...