O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Removing stop words

Commonly used words in English such as the, is, he, and so on, are generally called stop words. Other languages have similar commonly used words that fall under the same category. Stop word removal is another common preprocessing step for an NLP application. In this step, we remove words that do not signify any importance to the document, such as grammar articles and pronouns. Some examples of such words are a, an, he, and her. By themselves, these words may not have an impact on NLP tasks, such as text categorization or search, as they are frequently used throughout the text. Let us look at a sample of stop words in the English language, in the following code:

>>> from nltk.corpus import stopwords>>> sw_l = stopwords.words('english') ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required