
308 Coverbal Synchrony in Human-Machine Interaction
decision policy from an associated Learner module. A decision consists
of a state-action pair: the action being selected and the evidence used
in making that action represents the state. Each actor follows its own
action-selection policy, which controls how it explores its actions;
various methods such as å-greedy exploration, guided exploration,
or confidence value thresholds can be used (Sutton and Barto, 1998).
In our system, the Learner module takes the role of a critic. It
consists of the learning method, reward functions, and the decision
policy being learnt. A Learner monitors decisions ...