Transformers are deep learning architectures introduced by Google in 2017 that are designed to process sequential data for downstream tasks such as translation, question answering, or text summarization. In this manner, they aim to solve a similar problem to RNNs discussed in Chapter 9, Recurrent Neural Networks, but Transformers have a significant advantage as they do not require processing the data in order. Among other advantages, this allows a higher degree of parallelization and therefore faster training.
Due to their flexibility, Transformers can be pretrained on large bodies of unlabeled data and then finetuned for other tasks. Two main groups of such pretrained models are Bidirectional Encoder Representations from Transformers ...