What This Book Covers
This book introduces the basic ideas of text mining, which is a group of techniques that extracts useful information from one or more texts. This is a practical book, one that focuses on applications and examples. Although some statistics and mathematics is required, it is kept to a minimum, and what is used is explained.
This book, however, does make one demand: it assumes that you are willing to learn to write simple programs using Perl. This programming language is explicitly designed to work with text. In addition, it is open-source software that is available over the Web for free. That is, you can download the latest full-featured version of Perl right now, and install it on all the computers you want without paying a cent.
Chapters 2 and 3 give the basics of Perl, including a detailed introduction to regular expressions, which is a text pattern matching methodology used in a variety of programming languages, not just Perl. For each concept there are several examples of how to use it to analyze texts. Initial examples analyze short strings, for example, a few words or a sentence. Later examples use text from a variety of literary works, for example, the short stories of Edgar Allan Poe, Charles Dickens’s A Christmas Carol, Jack London’s The Call of the Wild, and Mary Shelley’s Frankenstein. All the texts used here are part of the public domain, so you can download these for free, too. Finally, if you are interested in word games, Perl plus extensive ...