Neural probabilistic language model

It is possible to learn the language model and, implicitly, the embedding function via a feedforward fully connected network. Given a sequence of n-1 words (wt-n+1 , ..., wt-1), it tries to output the probability distribution of the next word, wt (the following diagram is based on http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf):

A neural network language model that outputs the probability distribution of the word wt, given the words wt-n+1 ... wt-1. C is the embedding matrix

The network layers play different roles, such as the following:

  1. The embedding layer takes the one-hot representation of ...

Get Advanced Deep Learning with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.