August 2014
Beginner to intermediate
304 pages
7h 10m
English
The WordListCorpusReader class is one of the simplest CorpusReader classes. It provides access to a file containing a list of words, one word per line. In fact, you've already used it when we used the stopwords corpus in Chapter 1, Tokenizing Text and WordNet Basics, in the Filtering stopwords in a tokenized sentence and Discovering word collocations recipes.
We need to start by creating a wordlist file. This could be a single column CSV file, or just a normal text file with one word per line. Let's create a file named wordlist that looks like this:
nltk corpus corpora wordnet
Now we can instantiate a WordListCorpusReader class that will produce a list of words from our file. It takes two arguments: ...
Read now
Unlock full access