1

Introduction

Tuomas Virtanen1, Rita Singh2, Bhiksha Raj2

1Tampere University of Technology, Finland 2Carnegie Mellon University, USA

1.1 Scope of the Book

The term “computer speech recognition” conjures up visions of the science-fiction capabilities of HAL2000 in 2001, A Space Odessey, or “Data,” the anthropoid robot in Star Trek, who can communicate through speech with as much ease as a human being. However, our real-life encounters with automatic speech recognition are usually rather less impressive, comprising often-annoying exchanges with interactive voice response, dictation, and transcription systems that make many mistakes, frequently misrecognizing what is spoken in a way that humans rarely would. The reasons for these mistakes are many. Some of the reasons have to do with fundamental limitations of the mathematical framework employed, and inadequate awareness or representation of context, world knowledge, and language. But other equally important sources of error are distortions introduced into the recorded audio during recording, transmission, and storage.

As automatic speech-recognition—or ASR—systems find increasing use in everyday life, the speech they must recognize is being recorded over a wider variety of conditions than ever before. It may be recorded over a variety of channels, including landline and cellular phones, the internet, etc. using different kinds of microphones, which may be placed close to the mouth such as in head-mounted microphones or telephone ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.