RELAX NG

First Compact Patterns

Let’s explore how the patterns described in the previous chapter translate into the compact syntax.

The text Pattern

text is the simplest pattern in the XML syntax and is the simplest in the compact syntax as well. The text pattern is just:

text

In this definition, the word text identifies the text pattern.

Of course, because both syntaxes are equivalent, all that’s been said about text in RELAX NG’s XML syntax also applies to text in the compact syntax.

The attribute Pattern

For the compact syntax, the attribute pattern borrows Java’s curly brackets:

 attribute id { text }

In this definition, the first word, attribute, identifies the attribute pattern; the second one, id, is the name of the attribute. The curly brackets, {...}, delimit the definition of the content of the attribute.

Because empty curly brackets ({}) look weird and might imply empty attributes rather than attributes containing a text value, the convention of the XML syntax that makes a text pattern the implicit content for attributes is abandoned in the compact syntax. The content of attributes must be explicitly defined when you’re using the compact syntax. In other words, in the compact system, the following:

<attribute name="id"/>

translates into:

 attribute id { text }

while this:

attribute id { }

translates into a syntax error.

The compact syntax is position-sensitive, and words such as text and attribute are reserved words only when they appear in the first position. This is very convenient when you need to define attributes (or elements) that have names that are the same as reserved words. For instance, you can define attributes named text or even attribute without any precaution such as:

attribute text { text } 
attribute attribute { text }

Because the compact syntax is position-sensitive, it isn’t confused when reserved words are used as attribute names. This is also true for the element pattern which you’ll see in the next section.

Element

The simplest definition of the name element is:

element name { text }

To add an attribute to an element, you need a delimiter between the different pieces of content. You’ll see more use of delimiters and their meanings in Chapter 6, but for now, let’s use a comma as delimiter between content. This has the same effect as with XML syntax:

element title { attribute xml:lang { text }, text }

Whitespace (i.e., spaces, tabulations, line feeds, and carriage returns) isn’t significant for the compact syntax. The previous bit of code could also have been written:

element title {attribute xml:lang{ text }, text}

Many people tend to prefer to split up their code with whitespace so that there is only one definition per line. This technique, with each line helping to guide a reader through the structure, is more human-readable, but a RELAX NG processor won’t have any problems understanding the content. It treats both as equivalent.

The author element can be defined using more of the same components:

element author { attribute id { text }, element name { text }, element born 
    { text }, element died { text } }

Again, all that I’ve said about the properties of the element pattern in the XML syntax is true for the compact syntax: these are just two equivalent syntaxes for the same pattern.

The optional Pattern

The optional pattern is formalized as a trailing ? added after a definition, as is true in DTDs as well. For example, to define the attribute id as optional, you’d write:

attribute id { text }?

Note that the qualifier ? must be added after the definition of the pattern but before the delimiter. If you used this qualifier in the larger definition of the author element, it’d therefore look like this:

element author { attribute id { text }, element name { text }, element born 
    { text }?, element died { text }? }

In Chapter 3, I mentioned that other combinations of optional and required elements can be described using the optional pattern as a container. In the compact syntax, the optional pattern is represented as a qualifier rather than a container, so you need a container if you wish to create the same combinations. The container is a a set of parentheses (). The effect of parentheses depends on the optional qualifier following them. Parentheses without a qualifier are effectively transparent; they do nothing. The definition of author can be written as:

element author {( attribute id { text }, element name { text }, element born
    { text }?, element died { text }? )}

or:

element author { (attribute id { text }), (element name { text }),
    (element born { text })?, (element died { text }?) }

without changing its meaning. Parentheses are more useful (and are actually required) to write the combinations mentioned in Chapter 3. Combinations such as:

<optional> <element name="born"> <text/> </element>
    <element name="died"> <text/> </element> </optional>

translate into:

(element born { text }, element died { text })?

The following:

<optional> <element name="born"> <text/>
    </element> <optional> <element name="died"> <text/> </element>
    </optional> </optional>

translates into:

(element born { text }, element died { text }? )?

The oneOrMore Pattern

The oneOrMore pattern is also a qualifier and, in the DTD tradition, is a plus sign (+):

element author { attribute id { text }, element name { text }, element born
    { text }?, element died { text }? }+

The zeroOrMore Pattern

Last but not least, the zeroOrMore pattern is the asterisk ( * ) qualifier:

element character { attribute id { text }, element name { text }, element born 
    { text }?, element qualification { text } }*

Get RELAX NG now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

RELAX NG by Eric van der Vlist