November 2015
Intermediate to advanced
304 pages
5h 23m
English
LIKE QUENEAU'S STORY, the computational task in this book is trivial: given a text file, we want to display the N (e.g. 25) most frequent words and corresponding frequencies ordered by decreasing value of frequency. We should make sure to normalize for capitalization and to ignore stop words like "the", "for", etc. To keep things simple, we don't care about the ordering of words that have equal frequencies. This computational task is known as term frequency.
Here is an example of an input file and corresponding output after computing the term frequency:
Input: White tigers live mostly in India Wild lions live mostly in Africa
Output: live - 2 mostly - 2 africa - 1 india - 1 lions - 1 tigers - 1 white - 1