5.6. Find Any Word Not Preceded by a Specific Word
Problem
You want to match any word that is not immediately preceded by the word
cat
,
ignoring any whitespace, punctuation, or other nonword characters that
come between.
Solution
Lookbehind you
Lookbehind lets you check if text appears before a given position. It works by instructing the regex engine to temporarily step backward in the string, checking whether something can be found ending at the position where you placed the lookbehind. See Recipe 2.16 if you need to brush up on the details of lookbehind.
The following three regexes use negative lookbehind, which
looks like ‹(?<!⋯)
›. Unfortunately, the regex
flavors covered by this book differ in what kinds of patterns they
allow you to place within lookbehind. As a result, the solutions end
up working a bit differently
in each case. Make sure to check out the Discussion of this recipe for
further details.
Words not preceded by “cat”
(?<!\bcat\W+)\b\w+
Regex options: Case insensitive |
Regex flavor: .NET |
(?<!\bcat\W{1,9})\b\w+
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE |
(?<!\bcat)(?:\W+|^)(\w+)
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9 |
Simulate lookbehind
JavaScript and Ruby 1.8 do not support lookbehind at all, even though they do support lookahead. However, because the lookbehind for this problem appears at the very beginning of the regex, it is possible to perfectly simulate the lookbehind by splitting the regex into two parts, ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.