CHAPTER 9
A SAMPLE OF ADDITIONAL TOPICS
9.1 INTRODUCTION
Chapters 2 through 8 all have a theme. For example, regular expressions and data structures underlie chapters 2 and 3, respectively, and chapter 8 focuses on clustering. This one, however, covers three topics in less detail. The goal is to give the interested reader a few parting ideas as well as a few references for text mining.
9.2 PERL MODULES
Not only is Perl free, there are a vast number of free packages already written for Perl. Because the details of obtaining these depends on the operating system, see The Comprehensive Perl Archive Network (CPAN) Web site http://cpan.pen.org/ [54] for instructions on how to download them.
Perl packages are called Perl modules, which are grouped together by topic. Each name typically has two or three parts, which are separated by double colons. The first part usually denotes a general topic, for example, Lingua, String, and Text. The second part is either a subtopic or a specific module. For instance, Lingua’s subtopics are often specific languages; for example, Lingua: : EN for English and Lingua: : DE for German (DE stands for Deutsch). Our first example is from the former.
9.2.1 Modules for Number Words
Lingua: : EN: : Numbers [21] has a three-part name, where Lingua stands for language, and EN for English. The third part states what the package does in particular, and in this case, it involves numbers.
CPAN gives information about each module and tells us there are two functions ...