O'Reilly logo

Exercises in Programming Style by Cristina Videira Lopes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Prologue

Term Frequency

LIKE QUENEAU'S STORY, the computational task in this book is trivial: given a text file, we want to display the N (e.g. 25) most frequent words and corresponding frequencies ordered by decreasing value of frequency. We should make sure to normalize for capitalization and to ignore stop words like "the", "for", etc. To keep things simple, we don't care about the ordering of words that have equal frequencies. This computational task is known as term frequency.

Here is an example of an input file and corresponding output after computing the term frequency:

Input:
 White tigers live mostly in India
 Wild lions live mostly in Africa
Output:
 live - 2
 mostly - 2
 africa - 1
 india - 1
 lions - 1
 tigers - 1
 white - 1

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required