6
Pitch Estimation and Voiced – Unvoiced Classification of Speech
6.1 Introduction
Low bit-rate speech coders, traditionally called vocoders, rely heavily on extracting the correct speech parameters from a given speech segment. The three main speech features are the spectral envelope, the pitch and the voiced–unvoiced classification. The spectral envelope is usually extracted by a standard autocorrelation method which results in a linear predictive (LP) parameters representation. However extracting the correct pitch and voicing classification is not as straightforward and may require a combination of methods.
When measuring the pitch, it is assumed that the voiced signals are formed by passing quasi-periodic excitation signals through the LPC filter. The duration between the pulses in the excitation signal is called the pitch period T0 or fundamental frequency f0. Correct estimation of the pitch is essential for good quality speech-coding. Incorrect estimation of the pitch period can seriously degrade the quality of synthesized speech. Pitch determination algorithms (PDAs) have been studied in both the time and frequency domains, and a comparison is discussed in [1, 2]. Traditionally, autocorrelation-based methods [3] and their variants [4, 5] have been intensively investigated and widely applied to various speech coders [6–11]. Frequency domain approaches [12–14], on the other hand, have become popular recently due to the growing interest in sinusoidal speech coders, such as the ...
Get Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.