July 2017
Intermediate to advanced
360 pages
8h 26m
English
Stemming is a process that is used to transform particular words (such as verbs or plurals) into their radical form so as to preserve the semantics without increasing the number of unique tokens. For example, if we consider the three expressions I run, He runs, and Running, they can be reduced into a useful (though grammatically incorrect) form: I run, He run, Run. In this way, we have a single token that defines the same concept (run), which, for clustering or classification purposes, can be used without any precision loss. There are many stemmer implementations provided by NLTK. The most common (and flexible) is SnowballStemmer, based on a multilingual algorithm:
from nltk.stem.snowball import SnowballStemmer>>> ess = SnowballStemmer('english', ...Read now
Unlock full access