5.6. Find Any Word Not Preceded by a Specific Word

Problem

You want to match any word that is not immediately preceded by the word cat, ignoring any whitespace, punctuation, or other nonword characters that come between.

Solution

Lookbehind you

Lookbehind lets you check if text appears before a given position. It works by instructing the regex engine to temporarily step backward in the string, checking whether something can be found ending at the position where you placed the lookbehind. See Recipe 2.16 if you need to brush up on the details of lookbehind.

The following regexes use negative lookbehind, (?<!). Unfortunately, the regex flavors covered by this book differ in what kinds of patterns they allow you to place within lookbehind. The solutions therefore end up working a bit differently in each case. Read on to the section of this recipe for further details.

Words not preceded by “cat”

Any number of separating nonword characters:

(?<!\bcat\W+)\b\w+
Regex options: Case insensitive
Regex flavor: .NET

Limited number of separating nonword characters:

(?<!\bcat\W{1,9})\b\w+
Regex options: Case insensitive
Regex flavors: .NET, Java

Single separating nonword character:

(?<!\bcat\W)\b\w+
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python
(?<!\Wcat\W)(?<!^cat\W)\b\w+
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

Simulate lookbehind

JavaScript and Ruby 1.8 do not support lookbehind at all, even though they do support lookahead. However, ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.