January 2019
Intermediate to advanced
386 pages
11h 13m
English
We want to conclude this chapter by mentioning end-to-end techniques. Deep learning methods, such as CTC (http://www.jmlr.org/proceedings/papers/v32/graves14.pdf and https://arxiv.org/abs/1512.02595) and attention models (https://arxiv.org/abs/1508.01211) have allowed us to learn the full speech recognition pipeline in an end-to-end fashion. They do so without modeling phonemes explicitly. This means that these end-to-end models will learn acoustic and language models in one single model and directly output a distribution over words. These models illustrate the power of deep learning by combining everything in one model; with this, the model becomes conceptually easier to understand.