October 2015
Intermediate to advanced
306 pages
10h 38m
English
Fig. 1.1 From thoughts to speech. 3
Fig. 2.1 Illustration of the CD-DNN-HMM and its three core components. 24
Fig. 2.2 Illustration of the CNN in which the convolution is applied along frequency bands. 28
Fig. 3.1 A model of acoustic environment distortion in the discrete-time domain relating the clean speech sample x[m] to the distorted speech sample y[m]. 43
Fig. 3.2 Cepstral distribution of word oh in Aurora 2. 47
Fig. 3.3 The impact of noise, with varying mean values from 5 in (a) to 25 in (d), in the log-Mel-filter-bank domain. The clean speech has a mean value of 25 and a standard deviation of 10. The noise has a standard deviation of 2. 48
Fig. 3.4 Impact of noise with different standard deviation values in the log-Mel-filter-bank ...