392 Chapter 16 Planning Based on Markov Decision Processes
algorithm will eventually terminate and return a policy. If we add the condition that
the search space is strongly connected, i.e., there is a path with positive probability
from every state to every other state, then the algorithm will eventually return an
optimal policy.
It has been shown experimentally that real-time value iteration can solve much
larger problems than standard value and policy iteration [84]. The trade-off is that
the solution is not optimal and the algorithm is not complete; in fact, it may even
not terminate.
16.3 Planning under Partial Observability
Planning under Partial Observability in MDP (POMDP) relaxes the assumption
that the controller has complete knowledge about ...