4

Pretraining a RoBERTa Model from Scratch

In this chapter, we will build a RoBERTa model from scratch. The model will use the bricks of the transformer construction kit we need for BERT models. Also, no pretrained tokenizers or models will be used. The RoBERTa model will be built following the fifteen-step process described in this chapter.

We will use the knowledge of transformers acquired in the previous chapters to build a model that can perform language modeling on masked tokens step by step. In Chapter 2, Getting Started with the Architecture of the Transformer Model, we went through the building blocks of the original Transformer. In Chapter 3, Fine-Tuning BERT Models, we fine-tuned a pretrained BERT model.

This chapter will focus on ...

Get Transformers for Natural Language Processing - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.