2.6. Audio-Based Feature Extraction and Pattern Classification

Audio-based feature extraction consists of parameterizing speech signals into a sequence of feature vectors, which are less redundant for statistical modeling. Although speech signals are nonstationary, their short-term segments can be considered to be stationary. This means that classical signal processing techniques, such as spectral and cepstral analysis, can be applied to short segments of speech on a frame-by-frame basis.

It is well known that the physiological and behavioral characteristics of individual speakers are different. While the physiological differences (e.g., vocal tract shape) result in the variation of low-level spectral features among speakers, the behavioral differences ...

Get Biometric Authentication: A Machine Learning Approach now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.