O'Reilly logo

Games, Diversions & Perl Culture by Jon Orwant

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 19. Parsing Natural Language

Dan Brian

I See a Pattern Developing

Regular expressions are one of the triumphs of computer science. While often intimidating to beginning programmers, the ability to capture complex patterns of text in succinct representations gives developers one of the most powerful tools at their disposal. Perl’s pattern matching abilities are among the most advanced of any language, and certainly rank among those features that have served to make it one of the most popular languages ever created.

However, regexes can’t do everything. When the patterns in your data are complex, even Perl’s regular expressions fall short. Natural languages, like English, aren’t amenable to easy pattern matching: if you want to find sentences that express a particular sentiment, you need to first understand the grammar of the sentence, and regular expressions aren’t sufficient unless you throw a little intelligence into the mix. In this article, I’ll show how to do that.

We’ll make it possible to write code like this:

# create an array of everything cool
while ($sentence =~ /\G($something_that_rocks)/g) {
    push (@stuff_that_rocks, $1);
}

Our notion of “what’s cool” can depend not just on simple character patterns, but upon the words in a sentence, and in particular their role in the sentence and relationships to one another. In brief, this article explores the application of regular expressions to grammar. Note that I am not suggesting another syntax for regular expressions. From ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required