Chapter 5. Text Classification

We’re leaving images behind for now and turning our attention to another area where deep learning has proven to be a significant advance on traditional techniques: natural language processing (NLP). A good example of this is Google Translate. Originally, the code that handled translation was a weighty 500,000 lines of code. The new, TensorFlow-based system has approximately 500, and it performs better than the old method.

Recent breakthroughs also have occurred in bringing transfer learning (which you learned about in Chapter 4) to NLP problems. New architectures such as the Transformer architecture have led to the creation of networks like OpenAI’s GPT-2, the larger variant of which produces text that is almost human-like in quality (and in fact, OpenAI has not released the weights of this model for fear of it being used maliciously).

This chapter provides a whirlwind tour of recurrent neural networks and embeddings. Then we explore the torchtext library and how to use it for text processing with an LSTM-based model.

Recurrent Neural Networks

If we look back at how we’ve been using our CNN-based architectures so far, we can see they have always been working on one complete snapshot of time. But consider these two sentence fragments:

The cat sat on the mat.

She got up and impatiently climbed on the chair, meowing for food.

Say you were to feed those two sentences, one after the other, into a CNN and ask, where is the cat? You’d have a problem, because ...

Get Programming PyTorch for Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.