Common Patterns
After this overview of the syntax used by pattern
facets, let’s see some common
pattern
facets you
may have to use (or adapt) in your schemas or just consider as
examples.
String Datatypes
Regular expressions treat information in its textual form. This makes them an excellent mechanism for constraining strings.
Unicode blocks
Unicode is one of XML’s greatest assets. However,
there are few applications able to process and display all the
characters of the Unicode set correctly and still fewer users able to
read them! If you need to check that your string datatypes belong to
one (or more)
Unicode
blocks, you can use these pattern
facets:
<define name="BasicLatinToken"> <data type="token"> <param name="pattern">\p{IsBasicLatin}*</param> </data> </define> <define name="Latin-1Token"> <data type="token"> <param name="pattern">[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param> </data> </define>
or:
BasicLatinToken = xsd:token {pattern = "\p{IsBasicLatin}*"} Latin-1Token = xsd:token {pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
Note that such pattern
facets don’t impose a character encoding on the
document itself and that, for instance, the
Latin-1Token
datatype validates instance documents
using UTF-8, UTF-16, ISO-8869-1 or another encoding. (This statement
assumes the characters used in this string belong to the two Unicode
blocks BasicLatin
and
Latin-1Supplement
.) In other words, even the lexical space reflects some processing done by the parser, below the level ...
Get RELAX NG now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.