O'Reilly logo

Mastering Python for Data Science by Samir Madhavan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Stemming and lemmatization

Text documents can contain words in different forms, such as play, playing, and played. They are similar and they have a common root.

Stemming and lemmatization are techniques that are used to find these common roots. Finding the roots will help us count, play, playing, and played as a single entity as all the words talk about play.

Stemming is more of a crude form of arriving at the root of a word; so, in the case of the preceding example, playing would be reduced to play. Lemmatization brings into context words, such as worse and bad, that can have a common bad root.

Stemming

Stemming is a process of reducing a word to its root form. The root form is not a word by itself, but words can be formed by adding the right suffix ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required