In this chapter, we will understand different variants of BERT, such as ALBERT, RoBERTa, ELECTRA, and SpanBERT. We will start with understanding how ALBERT works. ALBERT is basically A Lite version of BERT model. The ALBERT model includes few architectural changes to the BERT to minimize the training time. We will cover how ALBERT works and how it differs from BERT in detail.
Moving on, we will learn about the RoBERTa model, which stands for a Robustly Optimized BERT pre-training Approach. RoBERTa is one of the most popular variants of the BERT and it is used in many state-of-the-art systems. RoBERTa works similar to BERT but with a few changes in the pre-training steps. We will explore ...