13 Measuring text similarities

This section covers

  • What is natural language processing?
  • Comparing texts based on word overlap
  • Comparing texts using one-dimensional arrays called vectors
  • Comparing texts using two-dimensional arrays called matrices
  • Efficient matrix computation using NumPy

Rapid text analysis can save lives. Let’s consider a real-world incident when US soldiers stormed a terrorist compound. In the compound, they discovered a computer containing terabytes of archived data. The data included documents, text messages, and emails pertaining to terrorist activities. The documents were too numerous to be read by any single human being. Fortunately, the soldiers were equipped with special software that could perform very fast text analysis. ...

Get Data Science Bookcamp now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.