Calculating the relative importance of each term
The true power of Trident is demonstrated in this recipe, with many of the abstractions used in order to calculate the TF-IDF value. Before the recipe is presented, it is important to understand the simple math behind TF-IDF. We will need the following components to calculate the TF-IDF:
tf(t,d): This component specifies the term frequency, that is, the number of times a given term (
t) appears in a given document (
df(t): This component specifies the document frequency, that is, how frequently a given term (
t) appears across all documents
D: This component specifies the document count, that is, the total number of documents
There are many ways to calculate the term frequency; for this recipe, we will ...