July 2020
Intermediate to advanced
360 pages
7h 8m
English
Term Frequency
LIKE QUENEAU’S STORY, the computational task in this book is trivial: given a text file, we want to display the N (e.g. 25) most frequent words and corresponding frequencies ordered by decreasing value of frequency. We should make sure to normalize for capitalization and to ignore stop words like “the,” “for,” etc. To keep things simple, we don’t care about the ordering of words that have equal frequencies. This computational task is known as term frequency.
Here is an example of an input file and corresponding output after computing the term frequency:
Input:White tigers live mostly in IndiaWild lions live mostly in AfricaOutput:live - 2mostly - 2africa - 1india - 1lions - 1tigers - 1white - 1 ...
Read now
Unlock full access