380 Chapter 16 Planning Based on Markov Decision Processes
reachability and extended goals (Section 16.4). The chapter ends with discussion
and exercises.
16.2 Planning in Fully Observable Domains
Under the hypothesis of full observability, we first introduce stochastic systems,
policies, and utility functions and formalize planning as an optimization problem
(Section 16.2.1). We then present and discuss some basic MDP planning algorithms:
policy, value, and real-time iteration (Section 16.2.2).
16.2.1 Domains, Plans, and Planning Problems
Domains as Stochastic Systems. A stochastic system is a nondeterministic state-
transition system with a probability distribution on each state transition. It is a tuple
= (S, A, P), where:
•
S is a finite set of states. ...