August 2014
Beginner to intermediate
304 pages
7h 10m
English
Stopwords are common words that generally do not contribute to the meaning of a sentence, at least for the purposes of information retrieval and natural language processing. These are words such as the and a. Most search engines will filter out stopwords from search queries and documents in order to save space in their index.
NLTK comes with a stopwords corpus that contains word lists for many languages. Be sure to unzip the data file, so NLTK can find these word lists at nltk_data/corpora/stopwords/.
We're going to create a set of all English stopwords, then use it to filter stopwords from a sentence with the help of the following code:
>>> from nltk.corpus import stopwords ...
Read now
Unlock full access