Chapter 21. Parsing

Damian Conway

So who cares about parsing anyway?

Er, well, humans do. Our brains seem to be hard-wired for a syntactic view of the world, and we strive (often unreasonably) to find or impose grammatical structures in our lives. Homo Sapiens is a species evolved for language (or possibly, by it). We are compulsive and incessant parsers: of written text, spoken words, our children’s faces, dogs’ tails, politicians’ body language, [9] the simple grammar of traffic lights, and the complex syntax of our own internal aches and pains. You’re parsing right now: shapes into letters, letters into words, words into sentences, sentences into messages, messages into (dis)belief!

It’s not surprising that programmers (many of whom were once human) should be concerned with parsing, too. If you use any of the modules in the Pod::, Date::, HTML::, CGI::, LWP::, or Getopt:: hierarchies, the Expect module, TeX::Hyphen, Text::Refer, ConfigReader, PGP, Term::ReadLine, CPAN, Mail::Tools, or one of the database interface modules, or even if you just read in a line at a time and match it against some regular expressions, then you’re parsing. Sometimes in the privacy and comfort of your own home.

Each of those modules contains a chunk of custom-made, carefully tuned, walnut-veneered code, which takes ugly raw data and sculpts it into finely chiseled information you can actually use.

Of course, if the CPAN doesn’t supply a parsing system for the particular ugly raw data you need to process, ...

Get Computer Science & Perl Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.