9 Transformers

This chapter covers

Understanding the inner workings of Transformers
Deriving word embeddings with BERT
Comparing BERT and Word2Vec
Working with XLNet

In late 2018, researchers from Google published a paper introducing a deep learning technique that would soon become a major breakthrough: Bidirectional Encoder Representations from Transformers, or BERT (Devlin et al. 2018). BERT aims to derive word embeddings from raw textual data just like Word2Vec, but does it in a much more clever and powerful manner: it takes into account both the left and right contexts when learning vector representations for words (figure 9.1). In contrast, Word2Vec uses a single piece of context. But this is not the only difference. BERT is grounded in ...

Get Deep Learning for Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Learning for Natural Language Processing by Stephan Raaijmakers

9 Transformers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly