6

Pretraining a Transformer from Scratch through RoBERTa

Sometimes, a pretrained model will not provide the results you expect. Even if the pretrained model goes through additional training through fine-tuning, it still will not work as planned. At that point, one approach is to initiate pretraining from scratch through platforms like Hugging Face to leverage architectures such as GPT and BERT, among others. Once you have pretrained a model from scratch, you will know how to train other models you might need for a project.

In this chapter, we will build a RoBERTa model, an advanced variant of BERT, from scratch. The model will use the bricks of the transformer construction kit we need for BERT models. Also, no pretrained tokenizers or models ...

Get Transformers for Natural Language Processing and Computer Vision - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.