Text and Empty Patterns, Whitespace, and Mixed Content
So far, we have used text
patterns only within
group
patterns. It’s important to
remember, however, that this pattern doesn’t mean
simply a text node but rather zero or more text nodes. This statement
deserves some exploration.
The reason why text
patterns accept zero text nodes is linked
to the policy adopted by RELAX NG regarding whitespace. Whitespace
processing rules are one of the fuzzier areas in XML. RELAX NG has
attempted to find the “least
surprising” policy that supports the most common
usages. You’ll see more
whitespace
processing when we study datatypes, but for now,
let’s say that RELAX NG doesn’t see
any distinction between empty strings; no string at all; strings
containing only whitespace before or after an element node; and to a
lesser extent, a single text child element containing only
whitespace.
For instance, in the following snippet:
<foo at1="" at2=" "> <bar/> <bar></bar> <bar> <baz/> <baz/> </bar> <bar> </bar> </foo>
RELAX NG treats as insignificant the values of at1
and at2
, the content of the first and second
bar
elements, the text between the third
bar
start tag and the first baz
element, the text between the two baz
elements,
and even the text within the last bar
element.
RELAX NG’s rules state that the content should match
either text or empty patterns. Here are two visible consequences for
the patterns we’ve seen so far:
Because
text
patterns match any text node, they must match strings that are either ...
Get RELAX NG now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.