Lemmatization can be utilized in NTLK using the WordNetLemmatizer. This class uses the WordNet service, an online semantic database to make its decisions. The code in the 07/04_lemmatization.py file extends the previous stemming example to also calculate the lemmatization of each word. The code of importance is the following:
from nltk.stem import PorterStemmerfrom nltk.stem.lancaster import LancasterStemmerfrom nltk.stem import WordNetLemmatizerpst = PorterStemmer()lst = LancasterStemmer()wnl = WordNetLemmatizer()print("Stemming / lemmatization results")for token in regexp_tokenize(sentences[0], pattern='\w+'): print(token, pst.stem(token), lst.stem(token), wnl.lemmatize(token))
And it results in the following output:
Stemming ...