Data Hacking with NLTK
NLTK is written such that you can explore data very easily and begin
to form some impressions without a lot of upfront investment. Before
skipping ahead, though, consider
following along with the interpreter session in Example 7-2
to get a feel for some of the powerful functionality that NLTK provides
right out of the box. Don’t forget that you can use the built-in
help
function to get more information whenever you need it.
For example, help(nltk)
would provide documentation on the
NLTK package. Also keep in mind that not all of the functionality from the
interpreter session is intended for incorporation into production
software, since output is written through standard output and not
capturable into a data structure such as a list. In that regard, methods
such as nltk.text.concordance
are considered “demo
functionality.” Speaking of which, many of NLTK’s modules have a demo
function that you can call to get some idea
of how to use the functionality they provide, and the source code for
these demos is a great starting point for learning how to use new APIs.
For example, you could run nltk.text.demo()
in the interpreter to get some
additional insight into capabilities provided by the
nltk.text
module. We’ll take a closer look at how some of
this demonstration functionality works over the coming pages.
Note
The examples throughout this chapter, including the following
interpreter session, use the split
method to tokenize
text. Chapter 8 introduces more sophisticated ...
Get Mining the Social Web now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.