O'Reilly logo

Computer Science & Perl Programming by Jon Orwant

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 21. Parsing

Damian Conway

So who cares about parsing anyway?

Er, well, humans do. Our brains seem to be hard-wired for a syntactic view of the world, and we strive (often unreasonably) to find or impose grammatical structures in our lives. Homo Sapiens is a species evolved for language (or possibly, by it). We are compulsive and incessant parsers: of written text, spoken words, our children’s faces, dogs’ tails, politicians’ body language, [9] the simple grammar of traffic lights, and the complex syntax of our own internal aches and pains. You’re parsing right now: shapes into letters, letters into words, words into sentences, sentences into messages, messages into (dis)belief!

It’s not surprising that programmers (many of whom were once human) should be concerned with parsing, too. If you use any of the modules in the Pod::, Date::, HTML::, CGI::, LWP::, or Getopt:: hierarchies, the Expect module, TeX::Hyphen, Text::Refer, ConfigReader, PGP, Term::ReadLine, CPAN, Mail::Tools, or one of the database interface modules, or even if you just read in a line at a time and match it against some regular expressions, then you’re parsing. Sometimes in the privacy and comfort of your own home.

Each of those modules contains a chunk of custom-made, carefully tuned, walnut-veneered code, which takes ugly raw data and sculpts it into finely chiseled information you can actually use.

Of course, if the CPAN doesn’t supply a parsing system for the particular ugly raw data you need to process, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required