FreqDist class encapsulates a dictionary of words and counts for a given list of words. Load the Gutenberg text of Julius Caesar by William Shakespeare. Let's filter out stopwords and punctuation:
punctuation = set(string.punctuation) filtered = [w.lower() for w in words if w.lower() not in sw and w.lower() not in punctuation]
FreqDist object and print associated keys and values with highest frequency:
fd = nltk.FreqDist(filtered) print "Words", fd.keys()[:5] print "Counts", fd.values()[:5]
The keys and values are printed as follows:
Words ['d', 'caesar', 'brutus', 'bru', 'haue'] Counts [215, 190, 161, 153, 148]
The first word in this list is of course not an English word, so we may need to add the heuristic ...