2.3. Match One of Many Characters
Problem
Create one regular expression to match all common
misspellings of calendar, so you can find this word in a
document without having to trust the author’s spelling ability. Allow an
a or
e to be used
in each of the vowel positions. Create another regular expression to
match a single hexadecimal character. Create a third regex to match a
single character that is not a hexadecimal character.
The problems in this recipe are used to explain an important and commonly used regex construct called a character class.
Solution
Calendar with misspellings
c[ae]l[ae]nd[ae]r
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Hexadecimal character
[a-fA-F0-9]
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonhexadecimal character
[^a-fA-F0-9]
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The notation using square brackets is called a
character class. A character class matches a
single character out of a list of possible characters. The three classes
in the first regex match either an a
or an e. They do so independently.
When you test calendar against this regex, the first
character class matches a, the second e, and the third a.
Inside a character class, only four characters have a special
function: \, ^, -, and ]. If you’re using Java or .NET, the opening
bracket [ is also a
metacharacter inside character classes.
A backslash always escapes the character ...