June 2020
Intermediate to advanced
382 pages
11h 39m
English
One way of determining the similarities between different documents is by first processing the input documents. The resultant data structure after processing unstructured documents is called a Term Document Matrix (TDM), which is shown in the following diagram:

A TDM has all of the glossary of words as rows and all of the documents as the columns. It can be used to establish which documents are similar to the other documents based on the selected distance measure. Google News, for example, suggests news to a user based on its similarity to a news item the user has already shown interest in. ...
Read now
Unlock full access