appendix C Natural language processing

C.1 Touring around the zoo: Meeting other Transformer models

In chapter 13, we discussed a powerful Transformer-based model known as BERT (bidirectional encoder representations from Transformers). But BERT was just the beginning of a wave of Transformer models. These models grew stronger and better, either by solving theoretical issues with BERT or re-engineering various aspects of the model to perform faster and better. Let’s understand some of these popular models to learn what sets them apart from BERT.

C.1.1 Generative pre-training (GPT) model (2018)

The story actually starts even before BERT. OpenAI introduced a model called GPT in the paper “Improving Language Understanding by Generative Pre-Training” ...

Get TensorFlow in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.