Greedy and Non-Greedy Matches

Problem

You have a pattern with a greedy quantifier like *, +, ?, or {}, and you want to stop it from being greedy.

A classic case of this is the naïve substitution to remove tags from HTML. Although it looks appealing, s#<TT>.*</TT>##gsi, actually deletes everything from the first open TT tag through the last closing one. This would turn "Even <TT>vi</TT> can edit <TT>troff</TT> effectively." into "Even effectively", completely changing the meaning of the sentence!

Solution

Replace the offending greedy quantifier with the corresponding non-greedy version. That is, change *, +, ?, and {} into *?, +?, ??, and {}?, respectively.

Discussion

Perl has two sets of quantifiers: the maximal ones *, +, ?, and {} (sometimes called greedy) and the minimal ones *?, +?, ??, and {}? (sometimes called stingy). For instance, given the string "Perl is a Swiss Army Chainsaw!", the pattern /(r.*s)/ matches "rl is a Swiss Army Chains" whereas /(r.*?s)/ matches "rl is".

With maximal quantifiers, when you ask to match a variable number of times, such as zero or more times for * or one or more times for +, the matching engine prefers the “or more” portion of that description. Thus /foo.*bar/ matches from the first "foo" up to the last "bar" in the string, rather than merely the next "bar", as some might expect. To make any of the regular expression repetition operators prefer stingy matching over greedy matching, add an extra ?. So *? matches zero or more times, but rather ...

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.