Speech recognition
In the previous sections, we saw how RNNs can be used to learn patterns of many different time sequences. In this section, we will look at how these models can be used for the problem of recognizing and understanding speech. We will give a brief overview of the speech recognition pipeline and provide a high-level view of how we can use neural networks in each part of the pipeline. In order to know more about the methods discussed in this section, we would like you to refer to the references.
Speech recognition pipeline
Speech recognition tries to find a transcription of the most probable word sequence considering the acoustic observations provided; this is represented by the following:
transcription = argmax( P(words | audio features)) ...
Get Python Deep Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.