2.3. Match One of Many Characters
Problem
Create one regular expression to match all common misspellings of calendar, so you can
find this word in a document without having to trust the author’s
spelling ability. Allow an a or e to be used in each of the vowel
positions. Create another regular expression to match a single
hexadecimal character. Create a third regex to match a single
character that is not a hexadecimal character.
Solution
Calendar with misspellings
c[ae]l[ae]nd[ae]r
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Hexadecimal character
[a-fA-F0-9]
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Nonhexadecimal character
[^a-fA-F0-9]
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The notation using square brackets is called a
character class. A character class matches a single character out of a
list of possible characters. The three classes in the first regex
match either an a or an e. They do so independently. When you
test calendar against this regex, the first
character class matches a, the second e, and the third a.
Outside character classes, a dozen punctuation characters are
metacharacters. Inside a character class, only four characters have a
special function: \, ^, -, and
]. If you’re using Java or .NET,
the opening bracket [ is also a metacharacter inside character classes. All other characters are literals and simply add themselves to the character class. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access