Name

[2.0] tokenize()

Breaks a string into a sequence of strings, using a regular expression as a separator.

Syntax

xs:string* tokenize($input as xs:string?, $pattern as xs:string)
xs:string* tokenize($input as xs:string?, $pattern as xs:string, 
                    $flags as xs:string)

Inputs

A string to be tokenized and a regular expression. The tokenize() function also accepts a third string containing flags that change how the regular expression is evaluated.

Note

It is a fatal error if a regular expression matches a zero-length string. See Appendix E for more details.

Outputs

A sequence of xs:strings, each of which represents a token parsed from the original string. The returned strings do not contain the separator.

Here are the full details of how tokenize() works:

  • If the first string is the empty sequence or a zero-length string, the empty sequence is returned.

  • If the regular expression doesn’t match anything in the input string, a singleton sequence containing the original input string is returned.

  • If the regular expression matches the start of the string, the first string in the returned sequence will be an empty string (""). Similarly, if the regular expression matches the end of the string, the last string in the returned sequence will be an empty string.

  • If the regular expression matches two overlapping strings in the input string, only the first match is replaced.

  • The regular expression cannot be a zero-length string, nor can it match a zero-length string (in other words, matches("", $pattern, $replacement) ...

Get XSLT, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.