5.4. Find All Except a Specific Word

Problem

You want to use a regular expression to match any complete word except cat. Catwoman, vindicate, and other words that merely contain the letters “cat” should be matched—just not cat.

Solution

A negative lookahead can help you rule out specific words, and is key to this next regex:

\b(?!cat\b)\w+
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Although a negated character class (written as [^]) makes it easy to match anything except a specific character, you can’t just write [^cat] to match anything except the word cat. [^cat] is a valid regex, but it matches any character except c, a, or t. Hence, although \b[^cat]+\b would avoid matching the word cat, it wouldn’t match the word time either, because it contains the forbidden letter t. The regular expression \b[^c][^a][^t]\w* is no good either, because it would reject any word with c as its first letter, a as its second letter, or t as its third. Furthermore, that doesn’t restrict the first three letters to word characters, and it only matches words with at least three characters since none of the negated character classes are optional.

With all that in mind, let’s take another look at how the regular expression shown at the beginning of this recipe solved the problem:

\b # Assert position at a word boundary. (?! # Not followed by: cat # Match "cat". \b # Assert position at a word boundary. ) # End the negative lookahead. \w+ ...

Get Regular Expressions Cookbook, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.