2.18. Add Comments to a Regular Expression
Problem
‹\d{4}-\d{2}-\d{2}› matches a date in yyyy-mm-dd
format, without doing any validation of the numbers. Such a
simple regular expression is appropriate when you know your data does
not contain any invalid dates. Add comments to this regular expression
to indicate what each part of the regular expression does.
Solution
\d{4} # Year
- # Separator
\d{2} # Month
- # Separator
\d{2} # Day| Regex options: Free-spacing |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
Discussion
Free-spacing mode
Regular expressions can quickly become complicated and difficult to understand. Just as you should comment source code, you should comment all but the most trivial regular expressions.
All regular expression flavors in this book, except JavaScript, offer an alternative regular expression syntax that makes it very easy to clearly comment your regular expressions. You can enable this syntax by turning on the free-spacing option. It has different names in various programming languages.
In .NET, set the RegexOptions.IgnorePatternWhitespace option.
In Java, pass the Pattern.COMMENTS
flag. Python expects re.VERBOSE. PHP, Perl, and Ruby use the
/x flag.
Turning on free-spacing mode has two effects. It turns the hash symbol (#) into a metacharacter, outside character classes. The hash starts a comment that runs until the end of the line or the end of the regex (whichever comes first). The hash and everything after it is simply ignored by the regular expression engine. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access