5.6. Find Any Word Not Preceded by a Specific Word

Problem

You want to match any word that is not immediately preceded by the word cat, ignoring any whitespace, punctuation, or other nonword characters that come between.

Solution

Lookbehind you

Lookbehind lets you check if text appears before a given position. It works by instructing the regex engine to temporarily step backward in the string, checking whether something can be found ending at the position where you placed the lookbehind. See Recipe 2.16 if you need to brush up on the details of lookbehind.

The following three regexes use negative lookbehind, which looks like (?<!). Unfortunately, the regex flavors covered by this book differ in what kinds of patterns they allow you to place within lookbehind. As a result, the solutions end up working a bit differently in each case. Make sure to check out the Discussion of this recipe for further details.

Words not preceded by “cat”

(?<!\bcat\W+)\b\w+
Regex options: Case insensitive
Regex flavor: .NET
(?<!\bcat\W{1,9})\b\w+
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE
(?<!\bcat)(?:\W+|^)(\w+)
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

Simulate lookbehind

JavaScript and Ruby 1.8 do not support lookbehind at all, even though they do support lookahead. However, because the lookbehind for this problem appears at the very beginning of the regex, it is possible to perfectly simulate the lookbehind by splitting the regex into two parts, ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.