December 2019
Intermediate to advanced
468 pages
14h 28m
English
The vanilla transformer input is augmented with sinusoidal positional encodings (see the The transformer model section), which are relevant only within the current segment. The following formula shows how to schematically compute the states
and
with the current positional encodings:

Here,
is the word-embedding sequence ...
Read now
Unlock full access