book

Practical Text Mining with Perl

Name: Practical Text Mining with Perl
Author: Roger Bilisoly
ISBN: 9780470176436

by Roger Bilisoly

August 2008

Intermediate to advanced

320 pages

9h 14m

English

Wiley

Read now

Unlock full access

COVER
SERIES TITLE
TITLE
COPYRIGHT PAGE
DEDICATION
LIST OF FIGURES
LIST OF TABLES
PREFACE
ACKNOWLEDGMENTS
CHAPTER 1: INTRODUCTION
1.1 OVERVIEW OF THIS BOOK1.2 TEXT MINING AND RELATED FIELDS1.3 ADVICE FOR READING THIS BOOK

CHAPTER 2: TEXT PATTERNS
2.1 INTRODUCTION2.2 REGULAR EXPRESSIONS2.3 FINDING WORDS IN A TEXT2.4 DECOMPOSING POE’S “THE TELL-TALE HEART” INTO WORDS2.5 A SIMPLE CONCORDANCE2.6 FIRST ATTEMPT AT EXTRACTING SENTENCES2.7 REGEX ODDS AND ENDS2.8 REFERENCESPROBLEMS
CHAPTER 3: QUANTITATIVE TEXT SUMMARIES
3.1 INTRODUCTION3.2 SCALARS, INTERPOLATION, AND CONTEXT IN PERL3.3 ARRAYS AND CONTEXT IN PERL3.4 WORD LENGTHS IN POE’S “THE TELL-TALE HEART”3.5 ARRAYS AND FUNCTIONS3.6 HASHES3.7 TWO TEXT APPLICATIONS3.8 COMPLEX DATA STRUCTURES3.9 REFERENCES3.10 FIRST TRANSITIONPROBLEMS
CHAPTER 4: PROBABILITY AND TEXT SAMPLING
4.1 INTRODUCTION4.2 PROBABILITY4.3 CONDITIONAL PROBABILITY4.4 MEAN AND VARIANCE OF RANDOM VARIABLES4.5 THE BAG-OF-WORDS MODEL FOR POE’S “THE BLACK CAT”4.6 THE EFFECT OF SAMPLE SIZE4.7 REFERENCESPROBLEMS
CHAPTER 5: APPLYING INFORMATION RETRIEVAL TO TEXT MINING
5.1 INTRODUCTION5.2 COUNTING LETTERS AND WORDS5.3 TEXT COUNTS AND VECTORS5.4 THE TERM-DOCUMENT MATRIX APPLIED TO POE5.5 MATRIX MULTIPLICATION5.6 FUNCTIONS OF COUNTS5.7 DOCUMENT SIMILARITY5.8 REFERENCESPROBLEMS
CHAPTER 6: CONCORDANCE LINES AND CORPUS LINGUISTICS
6.1 INTRODUCTION6.2 SAMPLING6.3 CORPUS AS BASELINE6.4 CONCORDANCING6.5 COLLOCATIONS AND CONCORDANCE LINES6.6 APPLICATIONS WITH REFERENCES6.7 SECOND TRANSITIONPROBLEMS
CHAPTER 7: MULTI VARIATE TECHNIQUES WITH TEXT
7.1 INTRODUCTION7.2 BASIC STATISTICS7.3 BASIC LINEAR ALGEBRA7.4 PRINCIPAL COMPONENTS ANALYSIS7.5 TEXT APPLICATIONS7.6 APPLICATIONS AND REFERENCESPROBLEMS
CHAPTER 8: TEXT CLUSTERING
8.1 INTRODUCTION8.2 CLUSTERING8.3 A NOTE ON CLASSIFICATION8.4 REFERENCES8.5 LAST TRANSITIONPROBLEMS
CHAPTER 9: A SAMPLE OF ADDITIONAL TOPICS
9.1 INTRODUCTION9.2 PERL MODULES9.3 OTHER LANGUAGES: ANALYZING GOETHE IN GERMAN9.4 PERMUTATION TESTS9.5 REFERENCES
APPENDIX A: OVERVIEW OF PERL FOR TEXT MINING
A.1 BASIC DATA STRUCTURESA.2 OPERATORSA.3 BRANCHING AND LOOPINGA.4 A FEW PERL FUNCTIONSA.5 INTRODUCTION TO REGULAR EXPRESSIONS
APPENDIX B: SUMMARY OF R USED IN THIS BOOK
B.1 BASICS OF RB.2 THIS BOOK’S R CODE
REFERENCES
INDEX

Content preview from Practical Text Mining with Perl

Acknowledgments

Thanks to the Department of Mathematical Sciences of Central Connecticut State University (CCSU) for an environment that provided me the time and resources to write this book. Thanks to Dr. Daniel Larose, Director of the Data Mining Program at CCSU, for encouraging me to develop Stat 527, an introductory course on text mining. He also first suggested that I write a data mining book, which eventually became this text.

Some of the ideas in chapters 2, 3, and 5 arose as I developed and taught text mining examples for Stat 527. Thanks to Kathy Albers, Judy Spomer, and Don Wedding for taking independent studies on text mining, which helped to develop this class. Thanks again to Judy Spomer for comments on a draft of chapter 2.

Thanks to Gary Buckles and Gina Patacca for their hospitality over the years. In particular, my visits to The Ohio State University’s libraries would have been much less enjoyable if not for them.

Thanks to Dr. Edward Force for reading the section on text mining German. Thanks to Dr. Krishna Saha for reading over my R code and giving suggestions for improvement. Thanks to Dr. Nell Smith and David LaPierre for reading the entire manuscript and making valuable suggestions on it.

Thanks to Paul Petralia, senior editor at Wiley Interscience who let me write the book that I wanted to write.

The notation and figures in my section 4.6.1 are based on section 1.1 and figure 1.1 of Word Fequency Distributions by R. Harald Baayen, which is volume 18 of the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781118210505Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Practical Text Mining with Perl

by Roger Bilisoly

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.