May 2018
Beginner to intermediate
384 pages
10h 19m
English
Lemmatization is a bit different from stemming. Stemming generally removes end characters from a word with the expectation that they will get the correct base word. However, sometimes it results in removing suffixes that add meaning to a word. Lemmatization tries to overcome this limitation of stemming. It tries to find out the base form of the word, called the lemma, based on a vocabulary of words that it has and a morphological analysis on words. It uses the WordNet lexical knowledge dictionary to get the correct base form of a word. However, this has its limitation as well, for example, it requires part-of-speech tagging otherwise it will treat everything as a noun.
Read now
Unlock full access