Processing Modes
There are four flags that change how regular expressions are evaluated:
sRegular expressions are evaluated in what the specs refer to as “dot-all” mode. When this flag is used, the dot operator (
.) matches any character. Under normal processing (without thesflag), the dot operator matches any character except the newline character (#xA). This flag is useful when you want to match strings that might include a newline character.Note
Perl and other languages refer to this as “single-line” mode; that’s why the abbreviation for “dot-all” mode is
s.mRegular expressions are evaulated in multiline mode. By default, the metacharacter (
^) matches the start of the entire string, while$matches the end of the entire string. In multiline mode,^matches the start of any line within the string, and$matches the end of any line within the string.iRegular expressions are evaluated in case-insensitive mode. The regular expression
"a"matches both"a"and"A".Note that Unicode issues can complicate this greatly. For example, the XQuery 1.0 and XPath 2.0 Functions and Operators spec gives the example of the Unicode sign for degrees Kelvin (
K), which is the letter"K". The combination ofregex="k"andflags="i"matches the Kelvin sign as well as the letters"k"(k) and"K"(K).Other Unicode characters don’t convert to letters. For example, the Unicode symbol for the Roman numeral
I(Ⅰ) looks like the letterI, but does not convert to one.xAll whitespace characters ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access