9 Transformers

This chapter covers

  • Understanding the inner workings of Transformers
  • Deriving word embeddings with BERT
  • Comparing BERT and Word2Vec
  • Working with XLNet

In late 2018, researchers from Google published a paper introducing a deep learning technique that would soon become a major breakthrough: Bidirectional Encoder Representations from Transformers, or BERT (Devlin et al. 2018). BERT aims to derive word embeddings from raw textual data just like Word2Vec, but does it in a much more clever and powerful manner: it takes into account both the left and right contexts when learning vector representations for words (figure 9.1). In contrast, Word2Vec uses a single piece of context. But this is not the only difference. BERT is grounded in ...

Get Deep Learning for Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.