Latent Semantic Analysis

Latent Semantic Analysis (LSA), also known as Latent Semantic Indexing (LSI), is an application of unsupervised dimensionality reduction techniques to textual data.

The problems that LSA tries to solve are the problems of:

  • Synonymy: This means multiple words having the same meaning
  • Polysemy: This means one word having multiple meanings

Shallow term-based techniques such as Bag of Words cannot solve these problems because they only look at the exact raw form of terms. For instance, words such as help and assist will be assigned to different dimensions of the Vector Space, even though they are very close semantically.

To solve these problems, LSA moves the documents from the usual Bag of Words Vector Space to some ...

Get Java: Data Science Made Easy now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.