December 2019
Intermediate to advanced
468 pages
14h 28m
English
It is possible to learn the language model and, implicitly, the embedding function via a feedforward fully connected network. Given a sequence of n-1 words (wt-n+1 , ..., wt-1), it tries to output the probability distribution of the next word, wt (the following diagram is based on http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf):

The network layers play different roles, such as the following:
Read now
Unlock full access