2.17. Match One of Two Alternatives Based on a Condition

Problem

Create a regular expression that matches a comma-delimited list of the words one, two, and three. Each word can occur any number of times in the list, but each word must appear at least once.

Solution

\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}(?(1)|(?!))(?(2)|(?!))(?(3)|(?!))
Regex options: None
Regex flavors: .NET, JavaScript, PCRE, Perl, Python

Java and Ruby do not support conditionals. When programming in Java or Ruby (or any other language), you can use the regular expression without the conditionals, and write some extra code to check if each of the three capturing groups matched something.

\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

.NET, JavaScript, PCRE, Perl, and Python support conditionals using numbered capturing groups. (?(1)then|else) is a conditional that checks whether the first capturing group has already matched something. If it has, the regex engine attempts to match then. If the capturing group has not participated in the match attempt thus far, the else part is attempted.

The parentheses, question mark, and vertical bar are all part of the syntax for the conditional. They don’t have their usual meaning. You can use any kind of regular expression for the then and else parts. The only restriction is that if you want to use alternation for one of the parts, you have to use a group to keep it together. Only one ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.