Chapter | 12 Multimodal Input
The close relationship between multimodality and cognition exists
on a number of levels. First, psychological and cognitive theories of
multimodal processing, together with recent neurological evidence,
can provide a biological and evolutionary perspective to explain the
advantages of multimodal interaction. The classic example is the
study of interpersonal dialogue. Second, the literature reveals that
cognitive processes and user performance are closely linked. The
capacity of working memory can impose severe limitations on perfor-
mance and learning. Cognitive load theory attempts to break down the
relationships between stimulus materials and performance and iden-
tifies multimodal processing as a method for improving performance
in tasks that induce high mental demand.
12.3.1 Multimodal Perception and Cognition
Human beings are physiologically designed to acquire and produce
information through a number of different modalities: the human
communication channel is made up of sensory organs, the central
nervous system, various parts of the brain and effectors (muscles or
glands) [26]. Sensory inputs from specific modalities each have their
own individual pathway into a primary sensory cortex and can be
processed in parallel [26]. In addition, there are specific multimodal
integration (input) and diffusion (output) association areas in the
brain that are highly interconnected [26]. Adding credence to the raft
of empirical evidence, neuroimaging technology such as positron
emission topography and functional magnetic resonance imaging
have been used to identify separate locations for the verbal/auditory,
imagery/spatial and executive functions of working memory [27, 28].
Multimodal perception and cognition structures in the human brain
appear to have been specifically designed to collate and produce
multimodal information. This is nowhere more apparent than in
studies of interpersonal dialogue.
The multimodal aspects of conversational speech and gesture have
received much attention in the literature. Communication is inherently
multimodal: we talk to one another, wave our arms in the air and make
PART | III Multimodal Human–Computer and Human-to-Human Interaction
eye contact with our dialogue partners, who also perceive, interpret
and understand the protocols of conversational dialogue [29, 30].
Although speech and gesture channels do not always convey the
same information, it is always semantically and pragmatically com-
patible [30]. The most prevalent and widely accepted theory on the
relationship between these modes of communication sees speech
and gesture production as an integrated process, generated from a
common underlying mental representation. Both modes are therefore
considered to be equally functional in creating communicative mean-
ing [29]. In this way, human communication can be seen to exploit our
natural ability to easily process and produce multimodal information.
Such a principle can be directly applied to the design and implemen-
tation of multimodal interfaces by allowing users to make flexible
use of the entire gamut of modal productions (e.g., gaze, gesture
and speech).
12.3.2 Cognitive Load and Performance
The relationship between working memory and performance is
explored by Swellers Cognitive load theory. The theory is driven
by empirical observations of how well people are able to learn from
different stimulus materials and corresponding hypotheses based on
well-established modal models of working memory architectures
such as Baddeley’s model [16] (see Section 12.4.1 for details).
Cognitive load is a concept that attempts to describe the experience
of mental demand, adding an interesting dimension to performance
assessment. The theory rests on the assumption that working memory
is limited in capacity [31] and duration [32]. Tasks with very high
or very low levels of cognitive load can severely impact a subject’s
performance: if too high, the subject will not have sufficient resources
to perform well, and if too low, there is a chance that the subject is not
being cognitively engaged in an optimal way [33]. Hence, effective
use of working memory processing is vital for achieving successful
knowledge transfer. Subjects exhibiting similar levels of performance
also may differ in their individual cognitive load experience. The
theory describes three types of cognitive load that contribute to mental
demand [34]:

Get Multi-Modal Signal Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.