Skip to Content
Java: Data Science Made Easy
book

Java: Data Science Made Easy

by Richard M. Reese, Jennifer L. Reese, Alexey Grigorev
July 2017
Beginner to intermediate
715 pages
17h 3m
English
Packt Publishing
Content preview from Java: Data Science Made Easy

Latent Semantic Analysis

Latent Semantic Analysis (LSA), also known as Latent Semantic Indexing (LSI), is an application of unsupervised dimensionality reduction techniques to textual data.

The problems that LSA tries to solve are the problems of:

  • Synonymy: This means multiple words having the same meaning
  • Polysemy: This means one word having multiple meanings

Shallow term-based techniques such as Bag of Words cannot solve these problems because they only look at the exact raw form of terms. For instance, words such as help and assist will be assigned to different dimensions of the Vector Space, even though they are very close semantically.

To solve these problems, LSA moves the documents from the usual Bag of Words Vector Space to some ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Java Data Science Cookbook

Java Data Science Cookbook

Rushdi Shams
Java for Data Science

Java for Data Science

Walter Molina, Richard M. Reese, Shilpi Saxena, Jennifer L. Reese

Publisher Resources

ISBN: 9781788475655Supplemental Content