BUY THIS BOOK
Add to Cart

Print Book $34.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £24.95

What is this?

Looking to Reprint this content?


Learning XSLT
Learning XSLT

By Michael Fitzgerald
Price: $34.95 USD
£24.95 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Transforming Documents with XSLT
Extensible Stylesheet Language Transformations, or XSLT, is a straightforward language that allows you to transform existing XML documents into new XML, Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or plain text documents. XML Path Language, or XPath, is a companion technology to XSLT that helps identify and find nodes in XML documents—elements, attributes, and other structures.
Here are a few ways you can put XSLT to work:
  • Transforming an XML document into an HTML or XHTML document for display in a web browser
  • Converting from one markup vocabulary to another, such as from Docbook (http://www.docbook.org) to XHTML
  • Extracting plain text out of an XML document for use in a non-XML application or environment
  • Building a new German language document by pulling and repurposing all the German text from a multilingual XML document
This is barely a start. There are many other ways that you can use XSLT, and you'll get acquainted with a number of them in the chapters that follow.
This book assumes that you don't know much about XSLT, but that you are ready to put it to work. Through a series of numerous hands-on examples, Learning XSLT guides you through many features of XSLT 1.0 and XPath 1.0, while at the same time introducing you to XSLT 2.0 and XPath 2.0.
If you don't know much about XML yet, it shouldn't be a problem because I'll also cover many of the basics of XML in this book. Technical terms are usually defined when they first appear and in a glossary at the end of the book. The XML specification is located at http://www.w3.org/TR/REC-xml.html.
Another specification closely related to XSLT is Extensible Stylesheet Language, or XSL, commonly referred to as XSL-FO (see http://www.w3.org/TR/xsl/). XSL-FO is a language for applying styles and formatting to XML documents. It is similar to Cascading Style Sheets (CSS), but it is written in XML and is somewhat more extensive. (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How XSLT Works
About the quickest way to get you acquainted with how XSLT works is through simple, progressive examples that you can do yourself. The first example walks you through the process of transforming a very brief XML document using a minimal XSLT stylesheet. You transform documents using a processor that complies with the XSLT 1.0 specification.
All the documents and stylesheets discussed in this book can be found in the example archive available for download at http://www.oreilly.com/catalog/learnxslt/learningxslt.zip. All example files mentioned in a particular chapter are in the examples directory of the archive, under the subdirectory for that chapter (such as examples/ch01, examples/ch02, and so forth). Throughout the book, I assume that these examples are installed at C:\LearningXSLT\examples on Windows or in something like /usr/mike/learningxslt/examples on a Unix machine.
Now consider the ridiculously brief XML document contained in the file msg.xml :
<msg/>
There isn't much to this document, but it's perfectly legal, well-formed XML. It's just a single, empty element with no content. Technically, it's an empty element tag .
Because it is the only element in the document, msg is the document element . The document element is sometimes called the root element , but this is not to be confused with the root node, which will be explained later in this chapter. The first element in any well-formed XML document is always considered the document element, as long as it also contains all other elements in the document (if it has any other elements in it). In order for XML to be well-formed , it must follow the syntax rules laid out in the XML specification. I'll highlight well-formedness rules throughout this book, when appropriate.
A document element is the minimum structure needed to have a well-formed XML document, assuming that the characters used for the element name are legal XML name characters, as they are in the case of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Client-Side XSLT in a Browser
Now comes the action. An XSLT processor is probably readily available to you on your computer in a browser such as Microsoft Internet Explorer (IE) Version 6 or later, Netscape Navigator (Netscape) Version 7.1 or later, or Mozilla Version 1.4 or later. All three of these browsers have client-side XSLT processing ability already built-in.
A common way to apply an XSLT stylesheet like msg.xsl to the document msg.xml in a browser is by using a processing instruction. You can see a processing instruction in a slightly altered version of msg.xml called msg-pi.xml . Open the file msg-pi.xml from examples/ch01 with one of the browsers mentioned. The result tree (a result twig, really) is displayed. Figure 1-1 shows you what the result looks like in IE Version 6, with service pack 1 (SP1). I explain how msg-pi.xml works in the section "The XML Stylesheet Processing Instruction" which follows.
Figure 1-1: Transforming msg-pi.xml with Internet Explorer
When the XSLT processor in the browser found the pattern identified by the template in msg.xsl, it wrote the string Found it! onto the browser's canvas or rendering space.
If you look at the source for the page using View Source or View Page Source, you will see that the source tree for the transformation (the document msg-pi.xml) is displayed, not the result tree.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using apply-templates
One possible element that can be contained inside of a template element is apply-templates. Because apply-templates is contained in template, it is called a child element of template. In XSLT, apply-templates is also termed an instruction element . An instruction element in XSLT is always contained within something called a template . A template is a series of transformation instructions that usually appear within a template element, but not always. A few other elements can contain instructions, as you will see later on. XSLT 1.0 has a number of instruction elements that will eventually be explained and discussed in this book.
The apply-templates element triggers the processing of the children of the node in the source document that the template matches. These children (child nodes) can be elements, attributes, text, comments, and processing instructions. If the apply-templates element has a select attribute, the XSLT processor searches exclusively for other nodes that match the value of the select attribute. These nodes are then subject to being processed by other templates in the stylesheet that match those nodes.
Let's not fret about what all that means right now. It's hard to follow exactly what XSLT is doing when you are just starting out. I'll cover more about how apply-templates works in the next chapter.
To understand how apply-templates works, first take a look at the document message.xml in examples/ch01:
<?xml version="1.0"?>
  
<message priority="low">Hey, XSLT isn't so hard after all!</message>
The message element in message.xml has an attribute in its start tag: the priority attribute with a value of low. Also, this element is not empty; it holds the string Hey, XSLT isn't so hard after all! In the terminology of XML, this text is called parsed character data , and in the terminology of XPath, this text is called a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
This chapter has given you a little taste of XSLT—how it works and a few things you can do with it. After reading this introduction, you should understand the ground rules of XSLT stylesheets and the steps involved in transforming documents with a browser, a command-line processor like Xalan, or a processor with a graphical interface, such as xRay2. In the next chapter, you will learn how to create elements, attributes, text, comments, and processing instructions in a result tree using both XSLT instruction elements and literal result elements.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Building New Documents with XSLT
In the first chapter of this book, you got acquainted with the basics of how XSLT works. This chapter will take you a few steps further by showing you how to add text and markup to your result tree with XSLT templates.
First, you'll add literal text to your output. Then you'll work with literal result elements, that is, elements that are represented literally in templates. You'll also learn how to add content with the text, element, attribute, attribute-set, comment, and processing-instruction elements. In addition, you'll get your first encounter with attribute value templates, which provide a way to define templates inside attribute values.
You can put plain, literal text into an XSLT template, and it will be written to a result tree when the template containing the text is processed. You saw this work in the very first example in the book (msg.xsl in Chapter 1). I'll go into more detail about adding literal text in this section.
Look at the single-element document text.xml in examples/ch02 (this directory is where all example files mentioned in this chapter can be found):
<?xml version="1.0"?>
  
<message>You can easily add text to your output.</message>
With text.xml in mind, consider the stylesheet txt.xsl:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
  
<template match="/">Message: <apply-templates/></template>

</stylesheet>
When applied to text.xml, here is what generally happens, although the actual order of events may vary internally in a processor:
  1. The template rule in txt.xsl matches the root node (/), the beginning point of the source document.
  2. The implicit, built-in template for elements then matches message.
  3. The text "Message: " (including one space) is written to the result tree.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting Text
You can put plain, literal text into an XSLT template, and it will be written to a result tree when the template containing the text is processed. You saw this work in the very first example in the book (msg.xsl in Chapter 1). I'll go into more detail about adding literal text in this section.
Look at the single-element document text.xml in examples/ch02 (this directory is where all example files mentioned in this chapter can be found):
<?xml version="1.0"?>
  
<message>You can easily add text to your output.</message>
With text.xml in mind, consider the stylesheet txt.xsl:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
  
<template match="/">Message: <apply-templates/></template>

</stylesheet>
When applied to text.xml, here is what generally happens, although the actual order of events may vary internally in a processor:
  1. The template rule in txt.xsl matches the root node (/), the beginning point of the source document.
  2. The implicit, built-in template for elements then matches message.
  3. The text "Message: " (including one space) is written to the result tree.
  4. apply-templates processes the text child node of a message using the built-in template for text.
  5. The built-in template for text picks up the text node "You can easily add text to your output."
  6. The output is serialized.
Apply txt.xsl to text.xml using Xalan:
xalan text.xml txt.xsl
This gives you the following output:
Message: You can easily add text to your output.
The txt.xsl stylesheet writes the little tidbit of literal text, "Message: ", from its template onto the output, and also grabs some text out of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Literal Result Elements
A literal result element is any XML element that is represented literally in a template, is not in the XSLT namespace, and is written literally onto the result tree when processed. Such elements must be well-formed within the stylesheet, according to the rules in XML 1.0.
The example stylesheet tedious.xsl , which produces XML output, contains an instance of the msg literal result element from a different namespace:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="xml" indent="yes"/>
<template match="/">
 <msg xmlns="http://www.wyeast.net/msg">
  <apply-templates xmlns="http://www.w3.org/1999/XSL/Transform"/>
 </msg>
</template>
  
</stylesheet>
Here is literal.xml:
<?xml version="1.0"?>
  
<message>You can use literal result elements in stylesheets.</message>
If you apply this stylesheet to literal.xml :
xalan literal.xml tedious.xsl
you will get this output:
<?xml version="1.0" encoding="UTF-8"?>
<msg xmlns="http://www.wyeast.net/msg">You can use literal result elements in 
stylesheets.</msg>
Because this stylesheet uses the XML output method, XML declaration was written to the result tree. The literal result element, along with its namespace declaration, was also written.
In tedious.xsl , the msg element has its own namespace declaration. This is because the XSLT processor would reject the stylesheet if it did not have a namespace declaration. The apply-templates element that follows must also redeclare the XSLT namespace because the processor will produce unexpected results without it. (Try it and you'll see.)
Ok, ok. This is getting a little confusing. If you had to add a namespace declaration to every literal element and then to following XSLT elements, that would add up to a lot of error-prone typing. So, it's time to start using a prefix with the XSLT namespace.
The conventional prefix for XSLT is
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using the Element Called element
Literal result elements aren't the only way to create elements on the result tree. You can also use the XSLT instruction element . The following document, element.xml, is similar to literal.xml, which you saw earlier in this chapter:
<?xml version="1.0"?>
  
<message>You can use the element element to create elements on the result tree.
</message>
Unlike literal.xsl, the stylesheet element.xsl uses element instead of a literal result element to create a new element in the output:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
  
 <xsl:template match="message">
   <xsl:element name="{concat('my', name(  ))}"><xsl:apply-templates/></xsl:element>
 </xsl:template>
  
</xsl:stylesheet>
element has three attributes. The name attribute is required as it obviously specifies a name for the element. In this example, the name attribute uses an attribute value template to compute a name for the element. In other words, the name of the element is computed by using the concat( ) and name( ) functions to contrive a new name based on the name of the current node. This is useful when you don't have the name of a node until you actually perform the transformation (at runtime).
You don't have to use an attribute value template in the value of name—you could use any legal XML name you want in the value. Computing the name, however, is one justification for using element. Another justification is using attribute sets, which you'll learn about presently. Otherwise, you might as well use a literal result element, but the choice remains yours.
element has two other attributes beside name: namespace and use-attribute-sets, which are optional. I'll discuss namespace here, and I'll explain how to work with use-attribute-sets in Section 2.4.1, a little later in this chapter.
The namespace attribute identifies a namespace name to associate with the element. If
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Adding Attributes
To add a single, nonliteral attribute to paragraph in a result tree, all you have to do is add an XSLT attribute element as a child of element. The stylesheet attribute.xsl does just that:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
  
 <xsl:template match="/">
   <xsl:element name="paragraph">
    <xsl:attribute name="priority">medium</xsl:attribute>
     <xsl:apply-templates/>
   </xsl:element>
 </xsl:template>
  
</xsl:stylesheet>
Like element, attribute can have name and namespace attributes. Again, the name attribute, which specifies the name of an attribute for the result tree, is required, while namespace is not. The namespace attribute works pretty much like it does in element. The values of both name and namespace can be computed by using an attribute value template, just as in element.
Apply attribute.xml (which contains no attributes) to attribute.xsl with:
xalan attribute.xml attribute.xsl
to produce a result with a priority attribute:
<?xml version="1.0" encoding="UTF-8"?>
<paragraph priority="medium">You can use the attribute element to create attributes 
on the result tree.</paragraph>
The next stylesheet, attributes.xsl, adds two more attributes to paragraph for a total of three attributes. One of the additional attributes will have a namespace, and one will not:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
  
<xsl:template match="/">
 <xsl:element name="paragraph">
  <xsl:attribute name="priority">medium</xsl:attribute>
  <xsl:attribute name="date">2003-09-23</xsl:attribute>
            <xsl:attribute name="doc:style"
 namespace="http://www.example.com/documents">classic</xsl:attribute>
   <xsl:apply-templates/>
  </xsl:element>
</xsl:template>
  
</xsl:stylesheet>
When transforming attribute.xml with attributes.xsl:
xalan attribute.xml attributes.xsl
it produces this result:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting Comments
Comments allow you to hide advisory text in an XML document. You can also use comments to label documents, or portions of them, which can be useful for debugging. When an XML processor sees a comment, it may ignore or discard it, or it can make the text content of comments available for other kinds of processing. The text in comments is not the same as the text found between element tags, that is, it is not character data. As such, comments can contain characters that are otherwise forbidden, like < and &. XML comments are formed like this:
<!-- This element holds the current date & time -->
Comments are markup and can go anywhere in an XML document, except directly inside the pointy brackets of other kinds of markup. This means, for example, that you can't place a comment inside of a start tag of an element.
The only legal XML characters that a comment must not contain are the sequence of two hyphen characters (-- ), as this pair of characters signals the end of a comment. Other than that, you are free to use any legal XML character in a comment. (Again, to check on what characters are legal in XML, and where they are legal, see Sections 2.2 through 2.4 of the XML specification.)
To insert a comment into a result tree, you can use the XSLT instruction element comment, as demonstrated in the comment.xsl stylesheet:
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
  
<xsl:template match="/">
 <xsl:comment> comment &amp; msg element </xsl:comment>
 <msg><xsl:apply-templates/></msg>
</xsl:template>
  
</xsl:stylesheet>
The output method is XML. If it were text, the comment would not show up in the output. Because comments in XML can contain markup characters, you can include an ampersand in a comment, among otherwise naughty characters, though it must first be represented by an entity reference (&amp;) in the stylesheet.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting Processing Instructions
It must come as no surprise that you can add processing instructions, or PIs, to the result tree with the processing-instruction element. This element is formed like this:
<xsl:processing-instruction name="xml-stylesheet">href="new.css"
   type="text/css"</xsl:processing-instruction>
A processing-instruction element requires one attribute, name, which identifies the target name for the PI. The value of this attribute must be an NCName, and, as such, must not be a QName and cannot contain a colon. In other words, you can't qualify a target name with a namespace.
The content of the processing-instruction element contains the pair of pseudo-attributes href and type that are necessary to apply the CSS stylesheet processing.css to the resulting XML document:
paragraph {font-size: 24pt; font-family: serif}
code {font-family: monospace}
These rules will apply to the paragraph and code elements in the result tree. Provided that you view the result tree in a browser, any paragraph elements will be rendered with a best-fit serif font, in 24-point type, while any code elements will be rendered in a monospace font. (Courier is an example of a monospace font.) You'll get a chance to see the effects of these style rules later on in this section.
In the example that follows, I'll discuss more than just PIs. I'll also talk about a different kind of content in an XML document, and why you have to use more than one template to get at it. Consider for a moment the following XML document, processing.xml, which contains mixed content:
<?xml version="1.0"?>
  
<message>You can add processing instructions to a document with the <courier>
processing-instruction</courier> element.</message>
The message element in processing.xml contains mixed content . Mixed content freely mixes character data and element content together. That's why you see tags for the courier element mixed with text in
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
One Final Example
Finally, to wrap things up, here is an example stylesheet that shows you, once again, how to perform most of the techniques discussed in this chapter. The example starts out with the rather short document containing mixed content, final.xml :
<?xml version="1.0"?>
  
<message>You can add processing instructions to a document with the <courier>
processing-instruction</courier> element.</message>
There isn't much to it, but you can augment final.xml with the well-rounded XSLT stylesheet, final.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
  
<xsl:attribute-set name="atts">
 <xsl:attribute name="noteworthy">true</xsl:attribute>
 <xsl:attribute name="priority">medium</xsl:attribute>
</xsl:attribute-set>
  
<xsl:template match="/">
 <xsl:processing-instruction name="xml-stylesheet">href="final.css" type="text/css"
</xsl:processing-instruction>
 <xsl:comment> final.xml as processed with final.xsl </xsl:comment>
 <doc>
  <heading>Final Summary</heading>
  <paragraph>Following is a summary of how you can build documents with XSLT:
</paragraph>
  <paragraph>You can add text either literally or with the <code>text</code> element.
</paragraph>
  <paragraph>You can use literal result elements in stylesheets.</paragraph>
  <xsl:element name="paragraph">You can use <xsl:element name="code">element
</xsl:element> elements in stylesheets.</xsl:element>
  <xsl:comment> you can add a line break &amp; some spaces with the text element 
</xsl:comment>
  <xsl:text>
  
  </xsl:text>
  <xsl:element name="paragraph"><xsl:attribute name="noteworthy">true</xsl:attribute>
You can add attributes to elements with the <xsl:element name="code">attribute
</xsl:element> element.</xsl:element>
  <xsl:element name="paragraph" use-attribute-sets="atts">You can even add sets of 
attributes to elements with the <xsl:element name="code">attribute-set</
xsl:element> top-level element.</xsl:element>
  <paragraph>You can add comments with the <code>comment</code> element.</paragraph>
  <xsl:element name="paragraph"><xsl:text>And last but not least: </xsl:text>
<xsl:apply-templates select="message"/></xsl:element>
 </doc>
</xsl:template>
  
<xsl:template match="courier">
 <xsl:element name="code"><xsl:apply-templates/></xsl:element>
</xsl:template>
  
</xsl:stylesheet>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
In this chapter, you have learned the techniques that allow you to build a new result tree document. You learned about literal result elements and the XSLT instruction elements text, element, attribute, attribute-set, comment, and processing-instruction. You also learned about XHTML's relationship to HTML, and came to grips with some of the fundamentals of how template rules are evaluated and processed (more to come on that topic). You are now ready to explore ways that you can finely tune a result tree with the output element. You'll find out how in Chapter 3.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Controlling Output
Chapter 3 shows you how to control the XML, HTML, and text output of an XSLT processor using the XSLT top-level element output. You have seen the output element in previous examples, but I have only discussed 2 of output's 10 attributes so far. I'll talk about each of output's attributes in this chapter.
In this chapter, I'll talk about the results you can expect from different output methods in XML, HTML, text, or custom output. I'll also cover indentation, how to manage XML declarations, document type declarations, CDATA sections, and media types. For more detail, cross-reference this chapter with Section 16 of the XSLT specification.
Be aware that not all XSLT processors adhere strictly to the output element. There are models in which the XSLT processor has no control over the final serialization of the output because the output values are overridden. You will see an example of this type of model when you use the Moxie processor, discussed in Chapter 17.
As you have already seen, the output element has a method attribute. This attribute indicates explicitly the kind of output you want the XSLT processor to produce, namely, XML, HTML, or plain text. These three amigos—the attribute values
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Output Method
As you have already seen, the output element has a method attribute. This attribute indicates explicitly the kind of output you want the XSLT processor to produce, namely, XML, HTML, or plain text. These three amigos—the attribute values xml, html, and text—should always be lowercase when used as values for method. (Again, XSLT 2.0 will also support the xhtml output method.)
If you don't assign a value to method, you get a default output method depending on what a stylesheet produces. The default output method for XSLT is XML unless the document element in the result is html. In such a case, the default output method is HTML. The tag name html can be in uppercase, lowercase, or mixed case, but it must not have a namespace URI associated with it (no xmlns attribute).

Section 3.1.1.1: Default HTML output

To understand how default HTML works, consider the document name.xml found in examples/ch03 (this is where all the examples files mentioned in this chapter are found):
<name>
 <last>Churchill</last>
 <first>Winston</first>
</name>
Then look at default-html.xsl that produces HTML using literal result elements:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   
<xsl:template match="name">
 <html>
  <body>
  <p><xsl:apply-templates select="last"/></p>
  <p><xsl:apply-templates select="first"/></p>
  </body>
 </html>
</xsl:template>
   
</xsl:stylesheet>
Notice that there is no output element in default-html.xsl to tell the processor explicitly what the output method is. Apply this stylesheet to name.xml with Xalan:
xalan -m name.xml default-html.xsl
and it will produce a default HTML result:
<html>
<head>
</head>
<body>
<p>Churchill</p>
<p>Winston</p>
</body>
</html>
The -m command-line option suppresses the META tag that Xalan would normally produce. The result does not have an XML declaration because Xalan evaluated the result as HTML, as it should. The result is also indented (line breaks at start tags, but zero space) because if the output method is HTML, a default value of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting XML
With the XML output method, whether declared explicitly or by default, a compliant XSLT processor produces well-formed XML as output. As you already know, well-formed XML follows the syntax rules outlined in the XML specification—rules such as matching start and end tags, matching quotes around attribute values, proper nesting of elements, and so forth. For example, if you create XML as you did in Chapter 2, the processor will make sure that the XML is well-formed. If it is not, the XSLT processor will report any errors.
The output element helps you to control a number of features relating XML output, including the XML declaration, document type declarations, and CDATA sections, all of which are discussed in the sections that follow.
As explained in Chapter 1, the XML declaration is optional. You don't have to use it, except under certain circumstances, such as when an encoding declaration is imperative. XSLT allows you to have control over the XML declaration with the output element. With output, you can keep XML declarations from being written to output, change version information, control the encoding declaration, and monitor the stand- alone declaration. I'll cover all of these features step-by-step in the sections that follow.

Section 3.2.1.1: Omitting the XML declaration

Most XSLT processors automatically write an XML declaration at the top of the result. If the XML declaration is not essential to your output, you can turn this behavior off by giving output's omit-xml-declaration attribute a value of yes; by default, the value is no when the attribute is not present. The omit-xml-declaration attribute is used in omit.xsl :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="name">
 <name>
  <family><xsl:apply-templates select="last"/></family>
  <given><xsl:apply-templates select="first"/></given>
 </name>
</xsl:template>
   
</xsl:stylesheet>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting HTML
You have seen a few examples that produce HTML output. The following HTML example is more complicated than ones you have seen before. This section covers explicit, presentation-oriented HTML output, discussed in Section 16.2 of the XSLT specification. The XML document, wg.xml (Example 3-1), contains the names of the former and current W3C XML Working Group (WG) members at the time of the publication of the first edition of XML 1.0.
Example 3-1. XML document listing the names of the XML Working Group members
<?xml version="1.0"?>
   
<!--
 names of persons acknowledged as current and past members
 of the W3C XML Working Group at the time of the publication
 of the first edition of the XML specification on 1998-02-10
-->
   
<names>
 <name>
  <last>Angerstein</last>
  <first>Paula</first>
 </name>
 <name>
  <last>Bosak</last>
  <first>Jon</first>
 </name>
 <name>
  <last>Bray</last>
  <first>Tim</first>
 </name>
 <name>
  <last>Clark</last>
  <first>James</first>
 </name>
 <name>
  <last>Connolly</last>
  <first>Dan</first>
 </name>
 <name>
  <last>DeRose</last>
  <first>Steve</first>
 </name>
 <name>
  <last>Hollander</last>
  <first>Dave</first>
 </name>
 <name>
  <last>Kimber</last>
  <first>Eliot</first>
 </name>
 <name>
  <last>Magliery</last>
  <first>Tom</first>
 </name>
<name>
  <last>Maler</last>
  <first>Eve</first>
 </name>
 <name>
  <last>Maloney</last>
  <first>Murray</first>
 </name>
<name>
  <last>Murata</last>
  <first>Makoto</first>
 </name>
 <name>
  <last>Nava</last>
  <first>Joel</first>
 </name>
 <name>
  <last>O'Connell</last>
  <first>Conleth</first>
 </name>
 <name>
  <last>Paoli</last>
  <first>Jean</first>
 </name>
 <name>
  <last>Sharpe</last>
  <first>Peter</first>
 </name>
 <name>
  <last>Sperberg-McQueen</last>
  <first>C. M.</first>
 </name>
 <name>
  <last>Tigue</last>
  <first>John</first>
 </name>
</names>
The element names last and first fit Western-oriented names, which admittedly is a problem when you are dealing with international names. In other examples in this chapter,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting Text
The text output method lets an XSLT processor know that you intend to output plain text to the result. You have already seen simple examples that do this previously in the book. This example shows you how to output programming language text using the text method. If you are not a programmer, this section may be a little tough to follow. You can skip it if programming makes you queasy or if you aren't interested in .NET, although the same approach can be used to generate Java, VisualBasic, COBOL, or the language of your choice.
Now, I'll show you how you can use XSLT to write a program in the C# programming language. The stylesheet csharp.xsl uses the text output method:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
   
<xsl:template match="name">
using System;
using System.Xml;
   
class Name {
   
    static void Main(  ) {
        XmlTextWriter w = new XmlTextWriter(Console.Out);
         w.Formatting = Formatting.Indented;
         w.Indentation = 1;
         w.WriteStartDocument(  );
         w.WriteStartElement("<xsl:value-of select="name(  )"/>");
         w.WriteAttributeString("title", "Mr.");
          w.WriteElementString("family", "<xsl:value-of select="last"/>");
            w.WriteElementString("given", "<xsl:value-of select="first"/>");
         w.WriteEndElement(  );
        w.Close(  );
   
    }
   
}
</xsl:template>
   
</xsl:stylesheet>
This stylesheet uses value-of instruction elements to grab string values from the source tree. The first occurrence of value-of uses the XPath function name( ) to grab the name of the element that the template matches. The template actually matches not just the name of an element node, but a node-set, that is, the set of nodes including the element name and its children. The value-of element, however, returns only the string value of the first node of this node-set. The next two occurrences of value-of capture the text node children of the last and first elements in the source tree, respectively. (You'll learn more about nodes and node-sets in Chapter 4.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using a QName Output Method
I have explained the xml, html, and text output methods. You can also use a QName for a value of the method attribute. But there's a catch: if you use a QName, it must be supported as an extension by the XSLT processor that you use with it. (This mechanism allows you to invoke a user-written serializer, such as with a SAX ContentHandler.) This can be useful if you want to produce non-XML formats as your output.
Johannes Döbler's XSLT processor jd.xslt offers several QName values for the method attribute by way of extension. One of them is jd:empty.
The value of method must be a QName, not an NCName. Any value other than xml, html, or text is considered an extension and must be qualified with a namespace.
The jd:empty output method, when used together with the jd.xslt processor, produces a result tree but doesn't serialize it. This is useful when you are interested only in measuring the performance of the processor with a given stylesheet. The stylesheet empty.xsl uses output with a method of jd:empty:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="jd:empty" xmlns:jd="http://www.aztecrider.com/xslt"/>
   
<xsl:template match="name">
 <name>
  <family><xsl:apply-templates select="last"/></family>
  <given><xsl:apply-templates select="first"/></given>
 </name>
</xsl:template>
   
</xsl:stylesheet>
The QName jd:empty is associated with the namespace name http://www.aztecrider.com/xslt. You can process empty.xsl against the document name.xml with jd.xslt to see what happens. (For details of how to download, install, and run jd.xslt, see the appendix.) To run it, enter the following at a command or shell prompt using the -verbose switch:
java -jar jdxslt.jar -verbose name.xml empty.xsl
You won't see a result, but the processor will deliver the following information:
jd.xslt processor version 1.4.0
   
java vm              = Sun Microsystems Inc., 1.4.1_01
parser               = org.apache.crimson.parser.XMLReaderImpl
modelbuilder factory = jd.xml.xpath.model.build.ModelBuilderFactory
read stylesheet      = file:C:/LearningXSLT/examples/ch03/empty.xsl
prepare stylesheet   = 180 ms
read xml input       = 10 ms (using normal tree model)
transform input      = 10 ms
max memory usage     = 1.937 MB
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Media Types
The last attribute I'll mention is media-type . This attribute allows you to set the media type for the result. Media types are also sometimes called MIME types (MIME is short for Multipurpose Internet Mail Extensions), but since the types apply to more than just email, the term media type is more encompassing.
Here is one example fragment. A media type of application/xml may be specified in an output element like this:
<xsl:output output="xml" media-type="application/xml"/>
The value of this attribute, if you use it, will not be reflected explicitly in the result. In fact, the specification makes no stipulations about whether a processor needs to provide this information to an application. Nevertheless, an application might possibly make the media type information available to a server running HTTP, which could then use it in the Content-Type field of an HTTP header. This was probably the intent of this obscure attribute.
Table 3-2 lists the default media types for the three built-in output methods of XSLT.
Table 3-2: Default media types
Method
Default media type
XML
text/xml
HTML
text/html
Text
text/plain
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
This chapter covered the results you get from different output methods, including default and unambiguous XML, HTML, text, or custom output. It also talked about indentation, working with XML declarations, document type declarations, CDATA sections, and media types. In the next chapter, you will learn more details about using XPath to look at XML documents as trees of nodes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Traversing the Tree
In the previous three chapters, you have seen a number of examples that use the XML Path Language (XPath). This chapter discusses XPath topics, such as the XPath data model, the difference between patterns and expressions, predicates, the difference between abbreviated and unabbreviated location paths, axes, and node and name tests. (XPath and XSLT functions will be discussed in the next chapter.)
Though it is not exactly light reading, you may want to print a copy of the XPath 1.0 specification. It is a little over 30 pages. You can find it at http://www.w3.org/TR/xpath.
The foundation of XPath is its view of the XML document as a tree with branches called nodes. XPath's data model is a tree data model. The tree model comes to us from traditional computer science. It is a way of organizing or imagining the order of data in a hierarchical or structured way. To illustrate the tree model, Figure 4-1 represents roughly the XML document nodes.xml found in examples/ch04 as a tree of nodes.
Each box in Figure 4-1 represents a node or point in the tree structure of the document. In the XPath data model, a node represents part of an XML document such as the root or starting point of the document, elements, attributes, text, and so on. In the traditional tree model, the lines connecting the nodes are called edges . If a node does not have children, it is called a leaf node . (The terms edge and leaf node are not used in the XPath spec.) If you follow the edges, you are following a path. The nodes in a tree have family relationships: parent-child, ancestor-descendant, sibling, and so forth.
Figure 4-1: A tree of nodes
An XML document, according to the XPath 1.0 data model, can be conceptually described as having seven possible node types:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The XPath Data Model
The foundation of XPath is its view of the XML document as a tree with branches called nodes. XPath's data model is a tree data model. The tree model comes to us from traditional computer science. It is a way of organizing or imagining the order of data in a hierarchical or structured way. To illustrate the tree model, Figure 4-1 represents roughly the XML document nodes.xml found in examples/ch04 as a tree of nodes.
Each box in Figure 4-1 represents a node or point in the tree structure of the document. In the XPath data model, a node represents part of an XML document such as the root or starting point of the document, elements, attributes, text, and so on. In the traditional tree model, the lines connecting the nodes are called edges . If a node does not have children, it is called a leaf node . (The terms edge and leaf node are not used in the XPath spec.) If you follow the edges, you are following a path. The nodes in a tree have family relationships: parent-child, ancestor-descendant, sibling, and so forth.
Figure 4-1: A tree of nodes
An XML document, according to the XPath 1.0 data model, can be conceptually described as having seven possible node types:
  • Root (called the document node in XPath 2.0)
  • Element
  • Attribute
  • Text
  • Namespace
  • Comment
  • Processing instruction
You have already encountered nodes of all these types earlier in the book. For further illustration, the file nodes.xml contains at least one occurrence of each of these nodes:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Location Paths
The basic syntax of XPath is the