Name
[2.0] tokenize()
Breaks a string into a sequence of strings, using a regular expression as a separator.
Syntax
xs:string*tokenize($input as xs:string?,$pattern as xs:string)xs:string*tokenize($input as xs:string?,$pattern as xs:string,$flags as xs:string)
Inputs
A string to be tokenized and a regular expression. The
tokenize() function also
accepts a third string containing flags that change how the
regular expression is evaluated.
Note
It is a fatal error if a regular expression matches a zero-length string. See Appendix E for more details.
Outputs
A sequence of xs:strings,
each of which represents a token parsed from the original string.
The returned strings do not contain the separator.
Here are the full details of how tokenize() works:
If the first string is the empty sequence or a zero-length string, the empty sequence is returned.
If the regular expression doesn’t match anything in the input string, a singleton sequence containing the original input string is returned.
If the regular expression matches the start of the string, the first string in the returned sequence will be an empty string (
""). Similarly, if the regular expression matches the end of the string, the last string in the returned sequence will be an empty string.If the regular expression matches two overlapping strings in the input string, only the first match is replaced.
The regular expression cannot be a zero-length string, nor can it match a zero-length string (in other words,
matches("", $pattern, $replacement) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access