The TF IDF formula gives the relative importance of a term in a corpus (list of documents), given by the following formula:

Where:
- tfi,j = number of occurence of i in j
- dfi = number of documents containing i
- N = total number of document
Consider a document that contains 1,000 words, wherein the word rat appears 3 times. The term frequency (TF) for rat is then (3/1000=) 0.003. Now, in 10,000 documents, the word cat appears in 1,000 of them. Therefore, the inverse document frequency (IDF) is calculated as log(10000/1000) = 1. Thus, the TF-IDF weight is the product of these quantities is 0.003 * 1 = 0.12.
The words or features in the ...