13 Measuring text similarities
This section covers
- What is natural language processing?
- Comparing texts based on word overlap
- Comparing texts using one-dimensional arrays called vectors
- Comparing texts using two-dimensional arrays called matrices
- Efficient matrix computation using NumPy
Rapid text analysis can save lives. Let’s consider a real-world incident when US soldiers stormed a terrorist compound. In the compound, they discovered a computer containing terabytes of archived data. The data included documents, text messages, and emails pertaining to terrorist activities. The documents were too numerous to be read by any single human being. Fortunately, the soldiers were equipped with special software that could perform very fast text analysis. ...