multimodal fission, etc.) as well as their appearance [33]. Functional factors include
the type of task (well- or ill-structured, homogeneous or heterogeneous [5]), the
number of available tasks, task complexity, task frequency, and task consequences
(particularly important for security-critical systems) [33]. For less-task-directed
systems the domain characteristics become more important (e.g., education or enter-
tainment). Metrics: Specification documents and task analysis methods, as in [39].
Most agent factors have to be specified by the system developer, whereas aesthetics
can be better specified by design experts or experienced salespersons. Functional
factors can be best specified by domain experts.
Context factors. The physical environment (home, office, mobile, or public use;
space, acoustic, and lighting conditions; transmission channels; potential parallel
activities of the user; privacy and security issues) as well as the service factors
(e.g. access restrictions, availability of the system, resulting costs). Metrics: Spec-
ification documents provided by the developers of the system.
It is during the interaction that the perception and judgment processes forming qual-
ity take place. Interaction performance aspects are organized into two cycles, their
order reflecting the processing step they are located in.
System interaction performance aspects include
Input performance. This can be quantified, for example, in terms of accuracy or
error rate, as is common practice for speech recognizers, gesture recognizers,
and facial expression recognizers. In addition, the degree of coverage of the
user’s behavior (vocabulary, gestures, expressions) as well as the system’s real-
time performance are indicators of input performance. Concerning special multi-
modal input such as face detection and person or hand tracking, see [37]for
metrics and a corpus to evaluate system components in a comparable way.
Input modality appropriateness. This can be judged on a theoretical basis, for exam-
ple, with the help of modality properties, as was proposed in [4]. The user’s context
has to be taken into account when determining appropriateness; for example,
spoken input is inappropriate for secret information like a PIN in public spaces.
Interpretation performance. This can be quantified in terms of accuracy when a
limited set of underlying semantic concepts is used for meaning description.
An example is counting the errors in filling in the correct attribute–value pairs
on the basis of an expert-derived correct interp retation. Such measures exist
for independent input modalities [24]. However, the performance of the modality
fusion component should also be considered by incorporating synchronicity and
redundancy in a measure.
Dialogue management performance. This can be defined depending on the func-
tion of interest. The dialogue manager’s main function is to drive the dialogue to
the intended goal and can be assessed only indirectly, in terms of dialogue
352 CHAPTER 14 Evaluation of Multimodal Interfaces for Ambient Intelligence

Get Human-Centric Interfaces for Ambient Intelligence now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.