Neural probabilistic language model

It is possible to learn the language model and, implicitly, the embedding function via a feedforward fully connected network. Given a sequence of n-1 words (wt-n+1 , ..., wt-1), it tries to output the probability distribution of the next word, wt (the following diagram is based on http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf):

A neural network language model that outputs the probability distribution of the word wt, given the words wt-n+1 ... wt-1. C is the embedding matrix

The network layers play different roles, such as the following:

  1. The embedding layer takes the one-hot representation of ...

Get Advanced Deep Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.