5.3. Find Similar Words

Problem

You have several problems in this case:

  • You want to find all occurrences of both color and colour in a string.

  • You want to find any of three words that end with “at”: bat, cat, or rat.

  • You want to find any word ending with phobia.

  • You want to find common variations on the name “Steven”: Steve, Steven, and Stephen.

  • You want to match any common form of the term “regular expression.”

Solution

Regular expressions to solve each of the problems just listed are shown in turn. All of these solutions are listed with the case insensitive option.

Color or colour

\bcolou?r\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Bat, cat, or rat

\b[bcr]at\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Words ending with “phobia”

\b\w*phobia\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Steve, Steven, or Stephen

\bSte(?:ven?|phen)\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations of “regular expression”

\breg(?:ularexpressions?|ex(?:ps?|e[sn])?)\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Use word boundaries to match complete words

All five of these regular expressions use word boundaries (\b) to ensure that they match only complete words. The patterns use several different approaches to allow variation in the words that they ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.