2.3. Match One of Many Characters
Problem
Create one regular expression to match all common
misspellings of calendar, so you can find this word in a
document without having to trust the author’s spelling ability. Allow an
a or
e to be used
in each of the vowel positions. Create another regular expression to
match a single hexadecimal character. Create a third regex to match a
single character that is not a hexadecimal character.
The problems in this recipe are used to explain an important and commonly used regex construct called a character class.
Solution
Calendar with misspellings
c[ae]l[ae]nd[ae]r
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Hexadecimal character
[a-fA-F0-9]
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonhexadecimal character
[^a-fA-F0-9]
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The notation using square brackets is called a
character class. A character class matches a
single character out of a list of possible characters. The three classes
in the first regex match either an a
or an e. They do so independently.
When you test calendar against this regex, the first
character class matches a, the second e, and the third a.
Inside a character class, only four characters have a special
function: \, ^, -, and ]. If you’re using Java or .NET, the opening
bracket [ is also a
metacharacter inside character classes.
A backslash always escapes the character ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access