2.3. Match One of Many Characters

Problem

Create one regular expression to match all common misspellings of calendar, so you can find this word in a document without having to trust the author’s spelling ability. Allow an a or e to be used in each of the vowel positions. Create another regular expression to match a single hexadecimal character. Create a third regex to match a single character that is not a hexadecimal character.

The problems in this recipe are used to explain an important and commonly used regex construct called a character class.

Solution

Calendar with misspellings

c[ae]l[ae]nd[ae]r
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal character

[a-fA-F0-9]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Nonhexadecimal character

[^a-fA-F0-9]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The notation using square brackets is called a character class. A character class matches a single character out of a list of possible characters. The three classes in the first regex match either an a or an e. They do so independently. When you test calendar against this regex, the first character class matches a, the second e, and the third a.

Inside a character class, only four characters have a special function: \, ^, -, and ]. If you’re using Java or .NET, the opening bracket [ is also a metacharacter inside character classes.

A backslash always escapes the character ...

Get Regular Expressions Cookbook, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.