Skip to Content
Advanced Perl Programming, 2nd Edition
book

Advanced Perl Programming, 2nd Edition

by Simon Cozens
June 2005
Intermediate to advanced content levelIntermediate to advanced
298 pages
7h 28m
English
O'Reilly Media, Inc.
Content preview from Advanced Perl Programming, 2nd Edition

Chapter 2. Parsing Techniques

One thing Perl is particularly good at is throwing data around. There are two types of data in the world: regular, structured data and everything else. The good news is that regular data—colon delimited, tab delimited, and fixed-width files—is really easy to parse with Perl. We won’t deal with that here. The bad news is that regular, structured data is the minority.

If the data isn’t regular, then we need more advanced techniques to parse it. There are two major types of parser for this kind of less predictable data. The first is a bottom-up parser. Let’s say we have an HTML page. We can split the data up into meaningful chunks or tokens—tags and the data between tags, for instance—and then reconstruct what each token means. See Figure 2-1. This approach is called bottom-up parsing because it starts with the data and works toward a parse.

Bottom-up parsing of HTML
Figure 2-1. Bottom-up parsing of HTML

The other major type of parser is a top-down parser. This starts with some ideas of what an HTML file ought to look like: it has an <html> tag at the start and an </html> at the end, with some stuff in the middle. The parser can find that pattern in the document and then look to see what the stuff in the middle is likely to be. See Figure 2-2. This is called a top-down parse because it starts with all the possible parses and works down until it matches the actual contents of the document. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Perl Programming

Advanced Perl Programming

Sriram Srinivasan
Perl in a Nutshell, 2nd Edition

Perl in a Nutshell, 2nd Edition

Nathan Patwardhan, Ellen Siever, Stephen Spainhour

Publisher Resources

ISBN: 0596004567Errata Page