Stemming words
Stemming is a technique to remove affixes from a word, ending up with the stem. For example, the stem of cooking is cook, and a good stemming algorithm knows that the ing suffix can be removed. Stemming is most commonly used by search engines for indexing words. Instead of storing all forms of a word, a search engine can store only the stems, greatly reducing the size of index while increasing retrieval accuracy.
One of the most common stemming algorithms is the Porter stemming algorithm by Martin Porter. It is designed to remove and replace well-known suffixes of English words, and its usage in NLTK will be covered in the next section.
Note
The resulting stem is not always a valid word. For example, the stem of cookery is cookeri ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access