The following recursive template replaces all occurrences of a search string with a replacement string.
<xsl:template name="search-and-replace"> <xsl:param name="input"/> <xsl:param name="search-string"/> <xsl:param name="replace-string"/> <xsl:choose> <!-- See if the input contains the search string --> <xsl:when test="$search-string and contains($input,$search-string)"> <!-- If so, then concatenate the substring before the search string to the replacement string and to the result of recursively applying this template to the remaining substring. --> <xsl:value-of select="substring-before($input,$search-string)"/> <xsl:value-of select="$replace-string"/> <xsl:call-template name="search-and-replace"> <xsl:with-param name="input" select="substring-after($input,$search-string)"/> <xsl:with-param name="search-string" select="$search-string"/> <xsl:with-param name="replace-string" select="$replace-string"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <!-- There are no more occurences of the search string so just return the current input string --> <xsl:value-of select="$input"/> </xsl:otherwise> </xsl:choose> </xsl:template>
If you want to replace only whole words, then you must ensure that
the characters immediately before and after the search string are in
the class of characters considered word delimiters.
We chose the characters in the
variable $punc
plus whitespace to be word
delimiters:
<xsl:template name="search-and-replace-whole-words-only"> <xsl:param name="input"/> <xsl:param name="search-string"/> <xsl:param name="replace-string"/> <xsl:variable name="punc" select="concat('.,;:( )[ ]!?$@&"',"'")"/> <xsl:choose> <!-- See if the input contains the search string --> <xsl:when test="contains($input,$search-string)"> <!-- If so, then test that the before and after characters are word delimiters. --> <xsl:variable name="before" select="substring-before($input,$search-string)"/> <xsl:variable name="before-char" select="substring(concat(' ',$before),string-length($before) +1, 1)"/> <xsl:variable name="after" select="substring-after($input,$search-string)"/> <xsl:variable name="after-char" select="substring($after,1,1)"/> <xsl:value-of select="$before"/> <xsl:choose> <xsl:when test="(not(normalize-space($before-char)) or contains($punc,$before-char)) and (not(normalize-space($after-char)) or contains($punc,$after-char))"> <xsl:value-of select="$replace-string"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$search-string"/> </xsl:otherwise> </xsl:choose> <xsl:call-template name="search-and-replace-whole-words-only"> <xsl:with-param name="input" select="$after"/> <xsl:with-param name="search-string" select="$search-string"/> <xsl:with-param name="replace-string" select="$replace-string"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <!-- There are no more occurences of the search string so just return the current input string --> <xsl:value-of select="$input"/> </xsl:otherwise> </xsl:choose> </xsl:template>
Tip
Notice how we construct $punc
using
concat( )
so it contains both single and double
quotes. It would be impossible to do this in any other way because
XPath and XSLT, unlike C, do not allow special characters to be
escaped with a backslash (\). XPath 2.0 will allow the quotes to be
escaped by doubling them up.
Searching and replacing is a common text-processing task. The
solution shown here is the most straightforward implementation of
search and replace written purely in terms of XSLT. When considering
the performance of this solution, the reader might think it is
inefficient. For each occurrence of the search string, the code will
call contains( )
, substring-before( ),
and substring-after( )
.
Presumably, each function will rescan
the input string for the search string. It seems like this approach
will perform two more searches than necessary. After some thought,
you might come up with one of the following, seemingly more
efficient, solutions shown in Example 1-4
and Example 1-5.
Example 1-4. Using a temp string in a failed attempt to improve search and replace
<xsl:template name="search-and-replace"> <xsl:param name="input"/> <xsl:param name="search-string"/> <xsl:param name="replace-string"/> <!-- Find the substring before the search string and store it in a variable --> <xsl:variable name="temp" select="substring-before($input,$search-string)"/> <xsl:choose> <!-- If $temp is not empty or the input starts with the search string then we know we have to do a replace. This eliminates the need to use contains( ). --> <xsl:when test="$temp or starts-with($input,$search-string)"> <xsl:value-of select="concat($temp,$replace-string)"/> <xsl:call-template name="search-and-replace"> <!-- We eliminate the need to call substring-after by using the length of temp and the search string to extract the remaining string in the recursive call. --> <xsl:with-param name="input" select="substring($input,string-length($temp)+ string-length($search-string)+1)"/> <xsl:with-param name="search-string" select="$search-string"/> <xsl:with-param name="replace-string" select="$replace-string"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$input"/> </xsl:otherwise> </xsl:choose> </xsl:template>
Example 1-5. Using a temp integer in a failed attempt to improve search and replace
<xsl:template name="search-and-replace"> <xsl:param name="input"/> <xsl:param name="search-string"/> <xsl:param name="replace-string"/> <!-- Find the length of the sub-string before the search string and store it in a variable --> <xsl:variable name="temp" select="string-length(substring-before($input,$search-string))"/> <xsl:choose> <!-- If $temp is not 0 or the input starts with the search string then we know we have to do a replace. This eliminates the need to use contains( ). --> <xsl:when test="$temp or starts-with($input,$search-string)"> <xsl:value-of select="substring($input,1,$temp)"/> <xsl:value-of select="$replace-string"/> <!-- We eliminate the need to call substring-after by using temp and the length of the search string to extract the remaining string in the recursive call. --> <xsl:call-template name="search-and-replace"> <xsl:with-param name="input" select="substring($input,$temp + string-length($search-string)+1)"/> <xsl:with-param name="search-string" select="$search-string"/> <xsl:with-param name="replace-string" select="$replace-string"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$input"/> </xsl:otherwise> </xsl:choose> </xsl:template>
The idea behind both attempts is that if you remember the spot where
substring-before( )
finds a match, then you can
use this information to eliminate the need to call contains( )
and substring-after( )
. You are forced
to introduce a call to starts-with( )
to
disambiguate the case in which substring-before( )
returns the empty string; this can happen when the search string is
absent or when the input string starts with the search string.
However, starts-with( )
is presumably faster than
contains( )
because it doesn’t
need to scan past the length of the search string. The idea that
distinguishes the second attempt from the first is the thought that
storing an integer offset might be more efficient than storing the
entire substring.
Alas, these supposed optimizations fail to produce any improvement
when using the Xalan XSLT implementation and actually produce timing
results that are an order of magnitude slower on
some inputs when using either Saxon or XT! My first hypothesis
regarding this unintuitive result was that the use of the variable
$temp
in the recursive call interfered with
Saxon’s tail-recursion optimization (see Recipe 1.6). However, by experimenting with large inputs
that have many matches, I failed to cause a stack overflow. My next
suspicion was that for some reason, XSLT substring( )
is actually slower than the substring-before( )
and substring-after( )
calls. Michael
Kay, the author of Saxon, indicated that Saxon’s
implementation of substring( )
was slow due to the
complicated rules that XSLT substring must implement, including
floating-point rounding of arguments, handling special cases where
the start or end point are outside the bounds of the string, and
issues involving Unicode surrogate pairs. In contrast,
substring-before( )
and substring-after( )
translate more directly into Java.
The real lesson here is that optimization is tricky business, especially in XSLT where there can be a wide disparity between implementations and where new versions continually apply new optimizations. Unless you are prepared to profile frequently, it is best to stick with simple solutions. An added advantage of obvious solutions is that they are likely to behave consistently across different XSLT implementations.
Get XSLT Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.