5.6. Find Any Word Not Preceded by a Specific Word
Problem
You want to match any word that is not immediately
preceded by the word cat
, ignoring any whitespace,
punctuation, or other nonword characters that come between.
Solution
Lookbehind you
Lookbehind lets you check if text appears before a given position. It works by instructing the regex engine to temporarily step backward in the string, checking whether something can be found ending at the position where you placed the lookbehind. See Recipe 2.16 if you need to brush up on the details of lookbehind.
The following regexes use negative lookbehind, ‹(?<!⋯)
›. Unfortunately, the regex
flavors covered by this book differ in what kinds of patterns they
allow you to place within lookbehind. The solutions therefore end up
working a bit differently in each case. Read on to the
section of this recipe for further details.
Words not preceded by “cat”
Any number of separating nonword characters:
(?<!\bcat\W+)\b\w+
Regex options: Case insensitive |
Regex flavor: .NET |
Limited number of separating nonword characters:
(?<!\bcat\W{1,9})\b\w+
Regex options: Case insensitive |
Regex flavors: .NET, Java |
Single separating nonword character:
(?<!\bcat\W)\b\w+
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python |
(?<!\Wcat\W)(?<!^cat\W)\b\w+
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9 |
Simulate lookbehind
JavaScript and Ruby 1.8 do not support lookbehind at all, even though they do support lookahead. However, ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.