Data Hacking with NLTK

NLTK is written such that you can explore data very easily and begin to form some impressions without a lot of upfront investment. Before skipping ahead, though, consider following along with the interpreter session in Example 7-2 to get a feel for some of the powerful functionality that NLTK provides right out of the box. Don’t forget that you can use the built-in help function to get more information whenever you need it. For example, help(nltk) would provide documentation on the NLTK package. Also keep in mind that not all of the functionality from the interpreter session is intended for incorporation into production software, since output is written through standard output and not capturable into a data structure such as a list. In that regard, methods such as nltk.text.concordance are considered “demo functionality.” Speaking of which, many of NLTK’s modules have a demo function that you can call to get some idea of how to use the functionality they provide, and the source code for these demos is a great starting point for learning how to use new APIs. For example, you could run nltk.text.demo() in the interpreter to get some additional insight into capabilities provided by the nltk.text module. We’ll take a closer look at how some of this demonstration functionality works over the coming pages.


The examples throughout this chapter, including the following interpreter session, use the split method to tokenize text. Chapter 8 introduces more sophisticated ...

