2.6. Match Whole Words
Problem
Create a regex that matches cat in My cat is brown, but not in category or bobcat. Create another
regex that matches cat in staccato, but not in any of the three
previous subject strings.
Solution
Word boundaries
\bcat\b
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonboundaries
\Bcat\B
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Word boundaries
The regular expression token ‹\b› is
called a word boundary. It matches at the start
or the end of a word. By itself, it results in a zero-length match.
‹\b› is an
anchor, just like the tokens introduced in the
previous section.
Strictly speaking, ‹\b› matches in these three positions:
Before the first character in the subject, if the first character is a word character
After the last character in the subject, if the last character is a word character
Between two characters in the subject, where one is a word character and the other is not a word character
To run a “whole words only” search using a regular expression,
simply place the word between two word boundaries, as we did with
‹\bcat\b›. The first
‹\b› requires the
‹c› to occur at the very
start of the string, or after a nonword character. The second ‹\b› requires the ‹t› to occur at the very end of
the string, or before a nonword character.
Line break characters are nonword characters. ‹\b› will match after a line break if the line break is immediately followed by a word character. ...