Data Hacking with NLTK
NLTK is written such that you can explore data very easily and begin
to form some impressions without a lot of upfront investment. Before
skipping ahead, though, consider
following along with the interpreter session in Example 7-2
to get a feel for some of the powerful functionality that NLTK provides
right out of the box. Don’t forget that you can use the built-in
help function to get more information whenever you need it.
help(nltk) would provide documentation on the
NLTK package. Also keep in mind that not all of the functionality from the
interpreter session is intended for incorporation into production
software, since output is written through standard output and not
capturable into a data structure such as a list. In that regard, methods
nltk.text.concordance are considered “demo
functionality.” Speaking of which, many of NLTK’s modules have a
demo function that you can call to get some idea
of how to use the functionality they provide, and the source code for
these demos is a great starting point for learning how to use new APIs.
For example, you could run
nltk.text.demo() in the interpreter to get some
additional insight into capabilities provided by the
nltk.text module. We’ll take a closer look at how some of
this demonstration functionality works over the coming pages.
The examples throughout this chapter, including the following
interpreter session, use the
split method to tokenize
text. Chapter 8 introduces more sophisticated ...