
16.2 Planning in Fully Observable Domains 385
We can define the utility in a state s with an action a as V (s, a) = R(s) − C(s, a),
and the utility of a policy in a state as V (s|π ) = R(s) − C(s, π (s)). This generalizes
to histories. Let h =s
0
, s
1
, ... be a history. The utility of history h induced by a
policy π is defined as
V (h|π ) =
i≥0
(R(s
i
) − C(s
i
, π (s
i
)))
One problem with this definition is that it usually will not converge to a finite
value. In fact, it is important that the total accumulated utility is finite; otherwise,
there is no way to compare histories. A common way to ensure a bounded measure
of utilities for infinite histories is to