5.4. Find All Except a Specific Word
Problem
You want to use a regular expression to match any complete word except
cat. Catwoman and other words that
merely contain the letters “cat” should be matched—just not cat.
Solution
A negative lookahead can help you rule out specific words, and is key to this next regex:
\b(?!cat\b)\w+
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Although a negated character class (written as ‹[^⋯]›) makes it easy to match anything
except a specific character, you can’t just write ‹[^cat]› to match anything except
the word cat. ‹[^cat]› is a valid regex, but it matches any
character except c, a, or t. Hence, although ‹\b[^cat]+\b› would avoid
matching the word cat, it wouldn’t match the word
cup either,
because it contains the forbidden letter c. The regular expression ‹\b[^c][^a][^t]\w*› is no good
either, because it would reject any word with c as its first letter,
a as its
second letter, or t as its third. Furthermore, that
doesn’t restrict the first three letters to word characters, and it
only matches words with at least three characters since none of the
negated character classes are optional.
With all that in mind, let’s take another look at how the regular expression shown at the beginning of this recipe solved the problem:
\b # Assert position at a word boundary. (?! # Assert that the regex below cannot be matched starting here... cat # Match "cat". \b # Assert position at a word boundary. ) # ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access