13.4. Choosing Greedy or Nongreedy Matches
Problem
You want your pattern to match the smallest possible string instead of the largest.
Solution
Place a ?
after a
quantifier to alter that portion of the
pattern:
// find all bolded sections
preg_match_all('#<b>.+?</b>#', $html, $matches);Or, use the U
pattern modifier ending to invert all
quantifiers from greedy to nongreedy:
// find all bolded sections
preg_match_all('#<b>.+</b>#U', $html, $matches);Discussion
By default, all regular expressions in PHP are what’s known as greedy. This means a quantifier always tries to match as many characters as possible.
For example, take the pattern p.*, which matches a
p and then 0 or more characters, and match it
against the string php. A greedy regular
expression finds one match, because after it grabs the opening
p, it continues on and also matches the
hp. A nongreedy regular expression, on the other
hand, finds a pair of matches. As before, it matches the
p and also the h, but then
instead of continuing on, it backs off and leaves the final
p uncaptured. A second match then goes ahead and
takes the closing letter.
The following code shows that the greedy match finds only one hit; the nongreedy ones find two:
print preg_match_all('/p.*/', "php"); // greedy
print preg_match_all('/p.*?/', "php"); // nongreedy
print preg_match_all('/p.*/U', "php"); // nongreedy
1
2
2Greedy matching is also known as maximal matching and nongreedy matching can be called minimal matching, because these options ...