September 2004
Intermediate to advanced
496 pages
13h 57m
English
Audio-based feature extraction consists of parameterizing speech signals into a sequence of feature vectors, which are less redundant for statistical modeling. Although speech signals are nonstationary, their short-term segments can be considered to be stationary. This means that classical signal processing techniques, such as spectral and cepstral analysis, can be applied to short segments of speech on a frame-by-frame basis.
It is well known that the physiological and behavioral characteristics of individual speakers are different. While the physiological differences (e.g., vocal tract shape) result in the variation of low-level spectral features among speakers, the behavioral differences ...