April 2017
Intermediate to advanced
318 pages
7h 40m
English
As noted previously, even though both word2vec models can be reduced to a classification problem, we are not really interested in the classification problem itself. Rather, we are interested in the side effect of this classification process, that is, the weight matrix that transforms a word from the vocabulary to its dense, low-dimensional distributed representation.
There are many examples of how these distributed representations exhibit often surprising syntactic and semantic information. For example, as shown in the following figure from Tomas Mikolov's presentation at NIPS 2013 (for more information refer to the article: Learning Representations of Text using Neural Networks, by T. Mikolov, ...