February 2018
Beginner to intermediate
364 pages
10h 32m
English
NLTK provides an implementation of the Porter stemming algorithm in its PorterStemmer class. An instance of this can easily be created by the following code:
>>> from nltk.stem import PorterStemmer>>> pst = PorterStemmer()>>> pst.stem('fishing')'fish'
The script in the 07/03_stemming.py file applies the Porter and Lancaster stemmers to the first sentence of our input file. The primary section of the code performing the stemming is the following:
pst = PorterStemmer()lst = LancasterStemmer()print("Stemming results:")for token in regexp_tokenize(sentences[0], pattern='\w+'): print(token, pst.stem(token), lst.stem(token))
And this results in the following output:
Stemming results:We We weare are arseeking seek seekdevelopers develop ...
Read now
Unlock full access