book

Practical Text Mining with Perl

Name: Practical Text Mining with Perl
Author: Roger Bilisoly
ISBN: 9780470176436

by Roger Bilisoly

August 2008

Intermediate to advanced

320 pages

9h 14m

English

Wiley

Read now

Unlock full access

COVER
SERIES TITLE
TITLE
COPYRIGHT PAGE
DEDICATION
LIST OF FIGURES
LIST OF TABLES
PREFACE
ACKNOWLEDGMENTS
CHAPTER 1: INTRODUCTION
1.1 OVERVIEW OF THIS BOOK1.2 TEXT MINING AND RELATED FIELDS1.3 ADVICE FOR READING THIS BOOK

CHAPTER 2: TEXT PATTERNS
2.1 INTRODUCTION2.2 REGULAR EXPRESSIONS2.3 FINDING WORDS IN A TEXT2.4 DECOMPOSING POE’S “THE TELL-TALE HEART” INTO WORDS2.5 A SIMPLE CONCORDANCE2.6 FIRST ATTEMPT AT EXTRACTING SENTENCES2.7 REGEX ODDS AND ENDS2.8 REFERENCESPROBLEMS
CHAPTER 3: QUANTITATIVE TEXT SUMMARIES
3.1 INTRODUCTION3.2 SCALARS, INTERPOLATION, AND CONTEXT IN PERL3.3 ARRAYS AND CONTEXT IN PERL3.4 WORD LENGTHS IN POE’S “THE TELL-TALE HEART”3.5 ARRAYS AND FUNCTIONS3.6 HASHES3.7 TWO TEXT APPLICATIONS3.8 COMPLEX DATA STRUCTURES3.9 REFERENCES3.10 FIRST TRANSITIONPROBLEMS
CHAPTER 4: PROBABILITY AND TEXT SAMPLING
4.1 INTRODUCTION4.2 PROBABILITY4.3 CONDITIONAL PROBABILITY4.4 MEAN AND VARIANCE OF RANDOM VARIABLES4.5 THE BAG-OF-WORDS MODEL FOR POE’S “THE BLACK CAT”4.6 THE EFFECT OF SAMPLE SIZE4.7 REFERENCESPROBLEMS
CHAPTER 5: APPLYING INFORMATION RETRIEVAL TO TEXT MINING
5.1 INTRODUCTION5.2 COUNTING LETTERS AND WORDS5.3 TEXT COUNTS AND VECTORS5.4 THE TERM-DOCUMENT MATRIX APPLIED TO POE5.5 MATRIX MULTIPLICATION5.6 FUNCTIONS OF COUNTS5.7 DOCUMENT SIMILARITY5.8 REFERENCESPROBLEMS
CHAPTER 6: CONCORDANCE LINES AND CORPUS LINGUISTICS
6.1 INTRODUCTION6.2 SAMPLING6.3 CORPUS AS BASELINE6.4 CONCORDANCING6.5 COLLOCATIONS AND CONCORDANCE LINES6.6 APPLICATIONS WITH REFERENCES6.7 SECOND TRANSITIONPROBLEMS
CHAPTER 7: MULTI VARIATE TECHNIQUES WITH TEXT
7.1 INTRODUCTION7.2 BASIC STATISTICS7.3 BASIC LINEAR ALGEBRA7.4 PRINCIPAL COMPONENTS ANALYSIS7.5 TEXT APPLICATIONS7.6 APPLICATIONS AND REFERENCESPROBLEMS
CHAPTER 8: TEXT CLUSTERING
8.1 INTRODUCTION8.2 CLUSTERING8.3 A NOTE ON CLASSIFICATION8.4 REFERENCES8.5 LAST TRANSITIONPROBLEMS
CHAPTER 9: A SAMPLE OF ADDITIONAL TOPICS
9.1 INTRODUCTION9.2 PERL MODULES9.3 OTHER LANGUAGES: ANALYZING GOETHE IN GERMAN9.4 PERMUTATION TESTS9.5 REFERENCES
APPENDIX A: OVERVIEW OF PERL FOR TEXT MINING
A.1 BASIC DATA STRUCTURESA.2 OPERATORSA.3 BRANCHING AND LOOPINGA.4 A FEW PERL FUNCTIONSA.5 INTRODUCTION TO REGULAR EXPRESSIONS
APPENDIX B: SUMMARY OF R USED IN THIS BOOK
B.1 BASICS OF RB.2 THIS BOOK’S R CODE
REFERENCES
INDEX

Content preview from Practical Text Mining with Perl

Appendix A

Overview of Perl for Text Mining

This appendix summarizes the basics of Perl in these areas: basic data structures, operators, branching and looping, functions, and regular expressions. The focus is on Perl’s text capabilities, and many references are made to code throughout this book.

The form of these code samples is slightly different than the ones in this book. To save space, the output is placed at the end of the computer code.

To run Perl, first download it by going to http://www.pen.org/ [45] and following the instructions there. Second, type the statements into a file with the suffix .p1, for example, call it program.p1. Third, you need to find out how to use your computer’s command line interface, which allows the typing of commands for execution. Fourth, type the statement below on the command line and then press the enter key. The output will appear below it.

perl program.pl

Remember that Perl is case sensitive. For example, commands have to be in lowercase, and the three variables $cat, $Cat, and $CAT are all distinct. Finally, do not forget to use semicolons to end each statement.

A.1 BASIC DATA STRUCTURES

A programmer must be able to store and modify information, which is kept in scalar, array, and hash variables. We start with scalars, which store a single value, and their names always start with a dollar sign. First, consider the examples in code sample A.1, which demonstrates Perl’s two types of scalars, strings and numbers. If a string is used as a ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781118210505Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Practical Text Mining with Perl

by Roger Bilisoly

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.