Chapter 8. BERTology: Putting It All Together

Together, we’ve come a long way since we started with fiddling with spacy in Chapter 1. We started with solving the most common NLP problems using the microwave-meal equivalent of deep learning libraries, and then we proceeded to the low-level details, including tokenization and embeddings. Along the way, we covered recurrent networks, including RNNs, LSTMs, and GRUs, as well as the Transformer architecture and attention mechanisms.

This chapter, in many ways, is the grand finale. We will tie all the pieces together and trace back the steps that led to the so-called ImageNet moment in 2018, which has since led to a flurry of excitement regarding the potential commercial applications of these advances in NLP. We will touch on some of these possibilities, too. Let’s get started.

ImageNet

It’s worth taking a moment to clarify what we mean by “ImageNet moment.” ImageNet is a computer vision dataset that was originally published in 2009. It became a benchmark for the progress in image classification, a core computer vision task, and spawned an annual computer vision competition to see which research team could best identify objects in the dataset’s images with the lowest error rate.

The high visibility of the competition helped spur significant advances in the field of computer vision starting in 2010. From 2009 through 2017, the winning accuracy jumped from 71.8% to 97.3%, surpassing human ability (achieving superhuman ability) and capturing ...

Get Applied Natural Language Processing in the Enterprise now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.