Contents
Preface xiii
1. Introduction 1
Jean-Philippe Thiran, Ferran Marqués, and
Her Bourlard
Part I
Signal Processing, Modelling and Related
Mathematical Tools 5
2. Statist ical Machine Learning for HCI 7
Samy Bengio
2.1. Introduction
7
2.2. Introduction to Statistical Learning 8
2.2.1. Types of Problem 8
2.2.2. Function Space 9
2.2.3. Loss Functions 10
2.2.4. Expected Risk and Empirical Risk 10
2.2.5. Statistical Learning Theory 11
2.3. Support Vector Machines for Binary Classification 13
2.4. Hidden Markov Models for Speech Recognition 16
2.4.1. Speech Recognition 17
2.4.2. Markovian Processes 17
2.4.3. Hidden Markov Models 18
2.4.4. Inference and Learning with HMMs 20
2.4.5. HMMs for Speech Recognition 22
2.5. Conclusion 22
References 23
3. Speech Processing 25
Thierry Dutoit and Stéphane Dupont
3.1. Introduction
26
3.2. Speech Recognition 28
3.2.1. Feature Extraction 28
3.2.2. Acoustic Modelling 30
3.2.3. Language Modelling 33
v
vi
Contents
3.2.4. Decoding 34
3.2.5. Multiple Sensors 35
3.2.6. Confidence Measures 37
3.2.7. Robustness 38
3.3. Speaker Recognition 40
3.3.1. Overview 40
3.3.2. Robustness 43
3.4. Text-to-Speech Synthesis 44
3.4.1. Natural Language Processing for Speech
Synthesis
44
3.4.2. Concatenative Synthesis with a Fixed Inventory 46
3.4.3. Unit Selection-Based Synthesis 50
3.4.4. Statistical Parametric Synthesis 53
3.5. Conclusions 56
References 57
4. Natural Language and Dialogue Processing 63
Olivier Pietquin
4.1. Introduction
63
4.2. Natural Language Under standing 64
4.2.1. Syntactic Parsing 64
4.2.2. Semantic Parsing 68
4.2.3. Contextual Interpretation 70
4.3. Natural Language Generation 71
4.3.1. Document Planning 72
4.3.2. Microplanning 73
4.3.3. Surface Realisation 73
4.4. Dialogue Processing 74
4.4.1. Discourse Modelling 74
4.4.2. Dialogue Management 77
4.4.3. Degrees of Initiative 80
4.4.4. Evaluation 81
4.5. Conclusion 85
References 85
5. Image and Video Processing Tools for HCI 93
Montse Pardàs, Verónica Vilaplana and
Cristian Canton-Ferrer
5.1. Introduction
93
5.2. Face Analysis 94
5.2.1. Face Detection 95
5.2.2. Face Tracking 96
5.2.3. Facial Feature Detection and Tracking 98
5.2.4. Gaze Analysis 100
Contents
vii
5.2.5. Face Recognition
101
5.2.6. Facial Expression Recognition 103
5.3. Hand-Gesture Analysis 104
5.4. Head Orientation Analysis and FoA Estimation 106
5.4.1. Head Orientation Analysis 106
5.4.2. Focus of Attention Estimation 107
5.5. Body Gesture Analysis 109
5.6. Conclusions 112
References 112
6. Processing of Handwrit ing and Sketching
Dynamics 119
Claus Vielhauer
6.1. Introduction
119
6.2. History of Handwriting Modality and the
Acquisition of Online Handwriting Signals
121
6.3. Basics in Acquisition, Examples for Sensors 123
6.4. Analysis of Online Handwriting and Sketching
Signals
124
6.5. Overview of Recognition Goals in HCI 125
6.6. Sketch Recognition for User Interface Design 128
6.7. Similarity Search in Digital Ink 133
6.8. Summary and Perspectives for Handwriting and
Sketching in HCI
138
References 139
Part II
Multimodal Signal Processing and
Modelling 143
7. Basic Concepts of Multimodal Analysis 145
Mihai Gurban and Jean-Philippe Thiran
7.1. Defining Multimodality
145
7.2. Advantages of Multimodal Analysis 148
7.3. Conclusion 151
References 152
8. Multimodal Information Fusion 153
Norman Poh and Josef Kittler
8.1. Introduction
153
8.2. Levels of Fusion 156
viii
Contents
8.3. Adaptive versus Non-Adaptive Fusion 158
8.4. Other Design Issues 162
8.5. Conclusions 165
References 165
9. Modality Integration Methods 171
Mihai Gurban and Jean-Philippe Thiran
9.1. Introduction
171
9.2. Multimodal Fusion for AVSR 172
9.2.1. Types of Fusion 172
9.2.2. Multistream HMMs 174
9.2.3. Stream Reliability Estimates 174
9.3. Multimodal Speaker Localisation 178
9.4. Conclusion 181
References 181
10. A Multimodal Recognition Framework for
Joint Modality Compensation and Fusion 185
Konstantinos Moustakas, Savvas Argyropoulos and
Dimitrios Tzovaras
10.1. Introduction
186
10.2. Joint Modality Recognition and Applications 188
10.3. A New Joint Modality Recognition Scheme 191
10.3.1. Concept 191
10.3.2. Theoretical Background 191
10.4. Joint Modality Audio-Visual Speech Recognition 194
10.4.1. Signature Extraction Stage 196
10.4.2. Recognition Stage 197
10.5. Joint Modality Recognition in Biometrics 198
10.5.1. Overview 198
10.5.2. Results 199
10.6. Conclusions 203
References 204
11. Managing Multimodal Data, Metadata and
Annotations: Challenges and Solutions 207
Andrei Popescu-Belis
11.1. Introduction
208
11.2. Setting the Stage: Concepts and Projects 208
11.2.1. Metadata versus Annotations 209
11.2.2. Examples of Large Multimodal Collections 210
11.3. Capturing and Recording Multimodal Data 211
11.3.1. Capture Devices 211
Contents
ix
11.3.2. Synchronisation
212
11.3.3. Activity Types in Multimodal Corpora 213
11.3.4. Examples of Set-ups and Raw Data 213
11.4. Reference Metadata and Annotations 214
11.4.1. Gathering Metadata: Methods 215
11.4.2. Metadata for the AMI Corpus 216
11.4.3. Reference Annotations: Procedure
and Tools
217
11.5. Data Storage and Access 219
11.5.1. Exchange Formats for Metadata and
Annotations
219
11.5.2. Data Servers 221
11.5.3. Accessing Annotated Multimodal Data 222
11.6. Conclusions and Perspectives 223
References 224
Part III
Multimodal Human–Computer and
Human-to-Human Interaction 229
12. Multimodal Input 231
Natalie Ruiz, Fang Chen, and Sharon Oviatt
12.1. Introduction
231
12.2. Advantages of Multimodal Input Interfaces 232
12.2.1. State-of-the-Art Multimodal Input Systems 234
12.3. Multimodality, Cognition and Performance 237
12.3.1. Multimodal Perception and Cognition 237
12.3.2. Cognitive Load and Performance 238
12.4. Understanding Multimodal Input Behaviour 239
12.4.1. Theoretical Frameworks 240
12.4.2. Interpretation of Multimodal Input Patterns 243
12.5. Adaptive Multimodal Interfaces 245
12.5.1. Designing Multimodal Interfaces that
Manage Users’ Cognitive Load
246
12.5.2. Designing Low-Load Multimodal Interfaces
for Education
248
12.6. Conclusions and Future Directions 250
References 251
13. Multimodal HCI Output: Facial Motion, Gestures
and Synthesised Speech Synchronisation 257
Igor S. Pandži´c
13.1. Introduction
257

Get Multi-Modal Signal Processing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.