Chapter 7. Recurrent Neural Networks for Natural Language Processing

In Chapter 5 you saw how to tokenize and sequence text, turning sentences into tensors of numbers that could then be fed into a neural network. You then extended that in Chapter 6 by looking at embeddings, a way to have words with similar meanings cluster together to enable the calculation of sentiment. This worked really well, as you saw by building a sarcasm classifier. But there’s a limitation to that, namely in that sentences aren’t just bags of words—often the order in which the words appear will dictate their overall meaning. Adjectives can add to or change the meaning of the nouns they appear beside. For example, the word “blue” might be meaningless from a sentiment perspective, as might “sky,” but when you put them together to get “blue sky” there’s a clear sentiment there that’s usually positive. And some nouns may qualify others, such as “rain cloud,” “writing desk,” “coffee mug.”

To take sequences like this into account, an additional approach is needed, and that is to factor recurrence into the model architecture. In this chapter you’ll look at different ways of doing this. We’ll explore how sequence information can be learned, and how this information can be used to create a type of model that is better able to understand text: the recurrent neural network (RNN).

The Basis of Recurrence

To understand how recurrence might work, let’s first consider the limitations of the models used thus far in the ...

Get AI and Machine Learning for Coders now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.