5.5. Find Any Word Not Followed by a Specific Word
Problem
You want to match any word that is not immediately
followed by the word cat, ignoring any whitespace,
punctuation, or other nonword characters that appear in between.
Solution
Negative lookahead is the secret ingredient for this recipe:
\b\w+\b(?!\W+cat\b)
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
As with many other recipes in this chapter, word boundaries
(‹\b›) and the word
character token (‹\w›)
work together to match a complete word. You can find in-depth
descriptions of these features in Recipe 2.6.
The ‹(?!⋯)› surrounding the second
part of this regex is a negative lookahead. Lookahead tells the regex
engine to temporarily step forward in the string, to check whether the
pattern inside the lookahead can be matched just ahead of the current
position. It does not consume any of the characters matched inside the
lookahead. Instead, it merely asserts whether a match is possible. Since
we’re using a negative lookahead,
the result of the assertion is inverted. In other words, if the pattern
inside the lookahead can be matched just ahead, the match attempt fails,
and regex engine moves forward to try all over again starting from the
next character in the subject string. You can find much more detail
about lookahead (and its counterpart, lookbehind) in Recipe 2.16.
As for the pattern inside the lookahead, the ‹\W+› matches one or more nonword characters, such as whitespace ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access