Making Do Without Regular Expressions

Problem

You would like to perform regular-expression-like operations but you don’t want to resort to nonstandard extensions.

Solution

Several common regular-expression-like matches can be emulated in native XSLT. Table 1-1 lists the regular-expression matches by using Perl syntax along with their XSLT/XPath equivalent. The single character “C” is a proxy for any user-specified single character, and the string “abc” is a proxy for any user supplied-string of nonzero length.

Table 1-1. Regular-expression matches

$string =~ /^C*$/
translate($string,'C','') = ''
$string =~ /^C+$/
$string and translate($string,'C', '') = ''
$string =~ /C+/
contains($string,'C')
$string =~ /C{2,4}/
contains($string,'CC') and not(contains($string,'CCCCC'))
$string =~ /^abc/
starts-with($string,'abc')
$string =~ /abc$/
substring($string, string-length($string) - string-length('abc') + 1) = 'abc'
$string =~ /abc/
contains($string,'abc')
$string =~ /^[^C]*$/
translate($string,'C','') = $string
$string =~ /^\s$/
not(normalize-space($string))
$string =~ /\s/
translate(normalize-space($string),' ','') != $string
$string =~ /^\S$/
translate(normalize-space($string),' ','') = $string

Discussion

When it comes to brevity and power, nothing beats a good regular-expression engine. However, many simple matching operations can be emulated by more cumbersome yet effective XPath expressions. Many of these matches are facilitated by translate( ), which removes extraneous characters so the match can be implemented as an equality test. Another useful application of translate is its ability to count the number of occurrences of a specific character or set of characters. For example, the following code counts the number of numeric characters in a string:

string-length(translate($string, 
          translate($string,'0123456789',''),''))

If it is unclear what this code does, refer to Recipe 1.3. Alternatively, you can write:

string-length($string) - 
string-length(translate($string,'0123456789',''))

This code trades a translate( ) call for an additional string-length( ) and a subtraction. It might be slightly faster.

An important way in which these XPath expressions differ from their Perl counterparts is that in Perl, special variables are set as a side effect of matching. These variables allow powerful string-processing techniques that are way beyond the scope of XSLT. If anyone attempted to mate Perl and XSLT into a hybrid language, I would want to be one of the first alpha users!

The good news is that XPath 2.0 will support regular expressions.

Get XSLT Cookbook now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.