GPT, BERT, and RoBERTa
In Chapter 15, we described the Transformer architecture and how it can be used for natural language translation. Transformers have also been used as building blocks to solve other natural language processing (NLP) problems. In this appendix, we describe three such examples.
A key idea is to pretrain a basic model on a large text corpus. As a result of this pretraining, the model learns general language structure. This model then can be either used as is to solve a different kind of task or extended with additional layers and fine-tuned for the actual task at hand. That is, these kinds of models make use of transfer learning. We saw ...