O'Reilly logo

Perl Cookbook by Nathan Torkington, Tom Christiansen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Greedy and Non-Greedy Matches

Problem

You have a pattern with a greedy quantifier like *, +, ?, or {}, and you want to stop it from being greedy.

A classic case of this is the naïve substitution to remove tags from HTML. Although it looks appealing, s#<TT>.*</TT>##gsi, actually deletes everything from the first open TT tag through the last closing one. This would turn "Even <TT>vi</TT> can edit <TT>troff</TT> effectively." into "Even effectively", completely changing the meaning of the sentence!

Solution

Replace the offending greedy quantifier with the corresponding non-greedy version. That is, change *, +, ?, and {} into *?, +?, ??, and {}?, respectively.

Discussion

Perl has two sets of quantifiers: the maximal ones *, +, ?, and {} (sometimes called greedy) and the minimal ones *?, +?, ??, and {}? (sometimes called stingy). For instance, given the string "Perl is a Swiss Army Chainsaw!", the pattern /(r.*s)/ matches "rl is a Swiss Army Chains" whereas /(r.*?s)/ matches "rl is".

With maximal quantifiers, when you ask to match a variable number of times, such as zero or more times for * or one or more times for +, the matching engine prefers the “or more” portion of that description. Thus /foo.*bar/ matches from the first "foo" up to the last "bar" in the string, rather than merely the next "bar", as some might expect. To make any of the regular expression repetition operators prefer stingy matching over greedy matching, add an extra ?. So *? matches zero or more times, but rather ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required