Assessments

The following are the answers to the questions given at the end of each chapter.

Chapter 1, A Primer on Transformers

  1. The steps involved in the self-attention mechanism are given here: 
  • First, we compute the dot product between the query matrix and the key matrix  and get the similarity scores.
  • Next, we divide  by the square root of the dimension of the key vector 
  • Then, we apply the softmax function to normalize the scores and ...

Get Getting Started with Google BERT now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.