Appendix D

GPT, BERT, and RoBERTa

This appendix logically follows Chapter 15, “Attention and the Transformer.”

In Chapter 15, we described the Transformer architecture and how it can be used for natural language translation. Transformers have also been used as building blocks to solve other natural language processing (NLP) problems. In this appendix, we describe three such examples.

A key idea is to pretrain a basic model on a large text corpus. As a result of this pretraining, the model learns general language structure. This model then can be either used as is to solve a different kind of task or extended with additional layers and fine-tuned for the actual task at hand. That is, these kinds of models make use of transfer learning. We saw ...

Get Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, NLP, and Transformers using TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.