Performing stemming

Stemming is the process of cutting down a token to its stem. Technically, it is the process or reducing inflected (and sometimes derived) words to their word stem - the base root form of the word.  As an example, the words fishing, fished, and fisher stem from the root word fish.  This helps to reduce the set of words being processed into a smaller base set that is more easily processed. 

The most common algorithm for stemming was created by Martin Porter, and NLTK provides an implementation of this algorithm in the PorterStemmer.  NLTK also provides an implementation of a Snowball stemmer, which was also created by Porter, and designed to handle languages other than English.  There is one more implementation provided ...

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.