Using a TF-IDF model
While we often refer to training a TF-IDF model, it is actually a feature extraction process or transformation rather than a machine learning model. TF-IDF weighting is often used as a preprocessing step for other models, such as dimensionality reduction, classification, or regression.
To illustrate the potential uses of TF-IDF weighting, we will explore two examples. The first is using the TF-IDF vectors to compute document similarity, while the second involves training a multilabel classification model with the TF-IDF vectors as input features.
Document similarity with the 20 Newsgroups dataset and TF-IDF features
You might recall from Chapter 4, Building a Recommendation Engine with Spark, that the similarity between two vectors ...