Chapter 14. Regular Expressions

This chapter defines the regular expression syntax accepted by the XPath functions matches(), replace(), and tokenize(), which were described in the previous chapter, as well as the <xsl:analyze-string> instruction described in Chapter 6.

This regular expression syntax is based on the definition in XML Schema, which in turn is based on the definition in the Perl language, which is generally taken as the definitive reference for regular expressions. However, all dialects of regular expression syntax have minor variations. Within Perl itself there are features that are deprecated, there are features that differ between Perl versions, and there are features that don't apply when all characters are Unicode.

XML Schema defines a subset of the Perl regular expression syntax; it chose this subset based on the requirements of a language that only does validation (that is, testing whether or not a string matches the pattern) and that only deals with Unicode strings. The requirements of the matches() function in XPath are similar, but XPath also uses regular expressions for tokenizing strings and for replacing substrings. These are more complex requirements, so some of Perl's regular expression constructs that XML Schema left out have been added back in for XPath.

In the grammar productions in this chapter, as elsewhere in the book, I generally enclose characters of the target language (that is, the regex language) in chevrons, for example «|». I have avoided ...

Get XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.