January 2019
Intermediate to advanced
386 pages
11h 13m
English
This section describes how RNNs can be used to model sequential data. The problem with the straightforward application of RNNs on speech recognition is that the labels of the training data need to be perfectly aligned with the input. If the data isn't aligned well, then the input to output mapping will contain too much noise for the network to learn anything. Some early attempts try to model the sequential context of the acoustic features by using hybrid RNN-HMM models, where the RNNs would model the emission probabilities of the HMM models, much in the same way that DBNs are used (http://www.cstr.ed.ac.uk/downloads/publications/1996/rnn4csr96.pdf).
Later experiments tried to train LSTMs to output the posterior probability ...