Chapter 6. Recurrent Neural Networks and Other Sequence Models

One of the big themes of this book so far has been transformers. In fact, almost every model we have trained so far has been some member or relative of the transformer family. Even the tokenizers we built and used were constructed with specific transformer architectures in mind.

But transformers aren’t the only model in town.

Transformers themselves are relatively recent—the original paper by Vaswani et al.1 was first published on arXiv in June 2017 (eons ago in the deep learning community but not too long ago in the span of human history). Before then, people weren’t really using transformers. So what was the alternative?

Recurrent neural networks (RNNs) were the name of the game back in the day. With all of our talk about how transformers and transfer learning have revolutionized the field, we might have given you the (false) impression that NLP wasn’t really a thing until BERT came out. This is most certainly not the case.

RNNs and their variants were the convolutional neural networks (CNNs) of NLP. In 2015, if you wanted to learn deep learning, most courses introduced CNNs as the “solution” for vision and RNNs as the “solution” for NLP. Perhaps the most salient example of 2015 RNN hype was Andrej Karpathy’s blog post, “The Unreasonable Effectiveness of Recurrent Neural Networks”, which shows how RNNs can be used to do a lot of interesting things and actually work.

RNNs and their variants, unlike transformers, are ...

Get Applied Natural Language Processing in the Enterprise now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.