5

Cosine Similarity

One significant automation of NLP is its capability to find words or documents that are semantically related. In a search engine, we want to find other words that are similar to our search words. What a search engine does is measure the similarity of two words or the similarity of two documents. We cannot compare two words or documents simply using their alphabets. But we can compare two words or documents mathematically in their latent vector space. In Chapter 4, Latent Semantic Analysis with scikit-learn, we learned how to represent documents as vectors. Because documents are represented as vectors, they can be compared mathematically. How do we measure the similarity of two vectors? The measure is called cosine similarity ...

Get The Handbook of NLP with Gensim now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.