February 2018
Beginner to intermediate
364 pages
10h 32m
English
Lemmatization can be utilized in NTLK using the WordNetLemmatizer. This class uses the WordNet service, an online semantic database to make its decisions. The code in the 07/04_lemmatization.py file extends the previous stemming example to also calculate the lemmatization of each word. The code of importance is the following:
from nltk.stem import PorterStemmerfrom nltk.stem.lancaster import LancasterStemmerfrom nltk.stem import WordNetLemmatizerpst = PorterStemmer()lst = LancasterStemmer()wnl = WordNetLemmatizer()print("Stemming / lemmatization results")for token in regexp_tokenize(sentences[0], pattern='\w+'): print(token, pst.stem(token), lst.stem(token), wnl.lemmatize(token))
And it results in the following output:
Stemming ...
Read now
Unlock full access