2.6. Match Whole Words

Problem

Create a regex that matches cat in My cat is brown, but not in category or bobcat. Create another regex that matches cat in staccato, but not in any of the three previous subject strings.

Solution

Word boundaries

\bcat\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Nonboundaries

\Bcat\B
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Word boundaries

The regular expression token \b is called a word boundary. It matches at the start or the end of a word. By itself, it results in a zero-length match. \b is an anchor, just like the tokens introduced in the previous section.

Strictly speaking, \b matches in these three positions:

  • Before the first character in the subject, if the first character is a word character

  • After the last character in the subject, if the last character is a word character

  • Between two characters in the subject, where one is a word character and the other is not a word character

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with \bcat\b. The first \b requires the c to occur at the very start of the string, or after a nonword character. The second \b requires the t to occur at the very end of the string, or before a nonword character.

Line break characters are nonword characters. \b will match after a line break if the line break is immediately followed by a word character. ...

Get Regular Expressions Cookbook, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.