
5.4 The Training Dialogue 101
these actions. Note that the user is free to choose
any
feasible action, not
only the one considered best by the agent.
How can the expected utility of an action be estimated? To make the
whole training process as painless as possible to the user, only those actions
should be suggested that actually advance the dialogue by providing valu-
able information to the agent. The numerical wrapper assessment ex-
plained in the previous section forms a good basis for estimating the effect
of an action on the expected wrapper quality. Using the formula
EU(a, u, n) = [ass(w3 - ass(W~urr)] " Pu(a) ...