December 2018
Beginner to intermediate
684 pages
21h 9m
English
The bag-of-words model creates document vectors that reflect the presence and relevance of tokens to the document. Latent semantic analysis reduces the dimensionality of these vectors and identifies what can be interpreted as latent concepts in the process. Latent Dirichlet allocation represents both documents and terms as vectors that contain the weights of latent topics.
The dimensions of the word and phrase vectors do not have an explicit meaning. However, the embeddings encode similar usage as proximity in the latent space in a way that carries over to semantic relationships. This results in the interesting properties that analogies can be expressed by adding and subtracting ...