BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


XSLT
XSLT By Doug Tidwell
August 2001
Pages: 478

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Getting Started
In this chapter, we review the design rationale behind XSLT and XPath and discuss the basics of XML. We also talk about other web standards and how they relate to XSLT and XPath. We conclude the chapter with a brief discussion of how to set up an XSLT processor on your machine so you can work with the examples throughout the book.
XML has gone from working group to entrenched buzzword in record time. Its flexibility as a language for presenting structured data has made it the lingua franca for data interchange. Early adopters used programming interfaces such as the Document Object Model (DOM) and the Simple API for XML (SAX) to parse and process XML documents. As XML becomes mainstream, however, it's clear that the average web citizen can't be expected to hack Java, Visual Basic, Perl, or Python code to work with documents. What's needed is a flexible, powerful, yet relatively simple, language capable of processing XML.
What's needed is XSLT.
XSLT, the Extensible Stylesheet Language for Transformations, is an official recommendation of the World Wide Web Consortium (W3C). It provides a flexible, powerful language for transforming XML documents into something else. That something else can be an HTML document, another XML document, a Portable Document Format (PDF) file, a Scalable Vector Graphics (SVG) file, a Virtual Reality Modeling Language (VRML) file, Java code, a flat text file, a JPEG file, or most anything you want. You write an XSLT stylesheet to define the rules for transforming an XML document, and the XSLT processor does the work.
The W3C has defined two families of standards for stylesheets. The oldest and simplest is Cascading Style Sheets (CSS), a mechanism used to define various properties of markup elements. Although CSS can be used with XML, it is most often used to style HTML documents. I can use CSS properties to define that certain elements be rendered in blue, or in 58-point type, or in boldface. That's all well and good, but there are many things that CSS can't do:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Design of XSLT
XML has gone from working group to entrenched buzzword in record time. Its flexibility as a language for presenting structured data has made it the lingua franca for data interchange. Early adopters used programming interfaces such as the Document Object Model (DOM) and the Simple API for XML (SAX) to parse and process XML documents. As XML becomes mainstream, however, it's clear that the average web citizen can't be expected to hack Java, Visual Basic, Perl, or Python code to work with documents. What's needed is a flexible, powerful, yet relatively simple, language capable of processing XML.
What's needed is XSLT.
XSLT, the Extensible Stylesheet Language for Transformations, is an official recommendation of the World Wide Web Consortium (W3C). It provides a flexible, powerful language for transforming XML documents into something else. That something else can be an HTML document, another XML document, a Portable Document Format (PDF) file, a Scalable Vector Graphics (SVG) file, a Virtual Reality Modeling Language (VRML) file, Java code, a flat text file, a JPEG file, or most anything you want. You write an XSLT stylesheet to define the rules for transforming an XML document, and the XSLT processor does the work.
The W3C has defined two families of standards for stylesheets. The oldest and simplest is Cascading Style Sheets (CSS), a mechanism used to define various properties of markup elements. Although CSS can be used with XML, it is most often used to style HTML documents. I can use CSS properties to define that certain elements be rendered in blue, or in 58-point type, or in boldface. That's all well and good, but there are many things that CSS can't do:
  • CSS can't change the order in which elements appear in a document. If you want to sort certain elements or filter elements based on a certain property, CSS won't do the job.
  • CSS can't do computations. If you want to calculate and output a value (maybe you want to add up the numeric value of all
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XML Basics
Almost everything we do in this book deals with XML documents. XSLT stylesheets are XML documents themselves, and they're designed to transform an XML document into something else. If you don't have much experience with XML, we'll review the basics here. For more information on XML, check out Erik T. Ray's Learning XML (O'Reilly, 2001) and Elliotte Rusty Harold and W. Scott Means's XML in a Nutshell (O'Reilly, 2001).
XML's heritage is in the Standard Generalized Markup Language (SGML). Created by Dr. Charles Goldfarb in the 1970s, SGML is widely used in high-end publishing systems. Unfortunately, SGML's perceived complexity prevented its widespread adoption across the industry (SGML also stands for "sounds great, maybe later"). SGML got a boost when Tim Berners-Lee based HTML on SGML. Overnight, the whole computing industry was using a markup language to build documents and applications.
The problem with HTML is that its tags were designed for the interaction between humans and machines. When the Web was invented in the late 1980s, that was just fine. As the Web moved into all aspects of our lives, HTML was asked to do lots of strange things. We've all built HTML pages with awkward table structures, 1-pixel GIFs, and other nonsense just to get the page to look right in the browser. XML is designed to get us out of this rut and back into the world of structured documents.
Whatever its limitations, HTML is the most popular markup language ever created. Given its popularity, why do we need XML? Consider this extremely informative HTML element:
<td>12304</td>
What does this fascinating piece of content represent?
  • Is it the postal code for Schenectady, New York?
  • Is it the number of light bulbs replaced each month in Las Vegas?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing Xalan
In this section, I'll show you how to install the Xalan XSLT processor. In the next chapter, we'll create our first stylesheet and use it to transform an XML document.
The installation process is pretty simple, assuming you already have a Java Runtime Environment (JRE) installed on your machine. Although very little of the code we look at in this book uses Java, the Xalan XSLT processor itself is written in Java. Once you've installed the JRE, go to http://xml.apache.org/xalan-j/ and download the latest stable build of the code. (If you're feeling brave, feel free to download last night's build instead.)
Once the Xalan .zip or .gzip file is downloaded, unpack it and add three files to your CLASSPATH. The three files include the .jar file for the Xerces parser, the .jar file for the Xalan stylesheet engine itself, and the .jar file for the Bean Scripting Framework. As of this writing, the .jar files are named xerces.jar, xalan.jar, and bsf.jar.
To make sure Xalan is installed correctly, go to a command prompt and type the following command:
java org.apache.xalan.xslt.Process
This is a Java class, so everything is case sensitive. You should see an error message like this:
java org.apache.xalan.xslt.Process
=xslproc options:
    -IN inputXMLURL
   [-XSL XSLTransformationURL]
   [-OUT outputURL]
   [-LXCIN compiledStylesheetFileNameIn]
   [-LXCOUT compiledStylesheetFileNameOutOut]
If you got this error message, you're all set! You're ready for the next chapter, in which we'll build our very first XSLT stylesheet.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
In this chapter, we've gone over the basics of XML and talked about DOM and SAX, two standards that are commonly used by XSLT processors. We also talked about other technology standards and how to install the Xalan stylesheet processor. At this point, you've got everything you need to build and use your first stylesheets, something we'll do in the next chapter.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: The Obligatory Hello World Example
In future chapters, we'll spend much time talking about XSLT, XPath, and various advanced functions used to transform XML documents. First, though, we'll go through a short example to illustrate how stylesheets work.
By the end of this chapter, you should know:
  • How to create a basic stylesheet
  • How to use a stylesheet to transform an XML document
  • How a stylesheet processor uses a stylesheet to transform an XML document
  • The structure of an XSLT stylesheet
Continuing the tradition of Hello World examples begun by Brian Kernighan and Dennis Ritchie in The C Programming Language (Prentice Hall, 1988), we'll transform a Hello World XML document.
First, we'll look at our sample document. This simple XML document, courtesy of the XML 1.0 specification, contains the famous friendly greeting to the world:
<?xml version="1.0"?>
<greeting>
  Hello, World!
</greeting>
What we'd like to do is transform this fascinating document into something we can view in an ordinary household browser.
Here's an XSLT stylesheet that defines how to transform the XML document:
<xsl:stylesheet 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     version="1.0">
  <xsl:output method="html"/>
        
  <xsl:template match="/">
    <xsl:apply-templates select="greeting"/>
  </xsl:template>
 
  <xsl:template match="greeting">
    <html>
      <body>
        <h1>
          <xsl:value-of select="."/>
        </h1>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>
We'll talk about these elements and what they do in just a minute. Keep in mind that the stylesheet is itself an XML document, so we have to follow all of the document rules we discussed in the previous chapter.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Goals of This Chapter
By the end of this chapter, you should know:
  • How to create a basic stylesheet
  • How to use a stylesheet to transform an XML document
  • How a stylesheet processor uses a stylesheet to transform an XML document
  • The structure of an XSLT stylesheet
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Transforming Hello World
Continuing the tradition of Hello World examples begun by Brian Kernighan and Dennis Ritchie in The C Programming Language (Prentice Hall, 1988), we'll transform a Hello World XML document.
First, we'll look at our sample document. This simple XML document, courtesy of the XML 1.0 specification, contains the famous friendly greeting to the world:
<?xml version="1.0"?>
<greeting>
  Hello, World!
</greeting>
What we'd like to do is transform this fascinating document into something we can view in an ordinary household browser.
Here's an XSLT stylesheet that defines how to transform the XML document:
<xsl:stylesheet 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     version="1.0">
  <xsl:output method="html"/>
        
  <xsl:template match="/">
    <xsl:apply-templates select="greeting"/>
  </xsl:template>
 
  <xsl:template match="greeting">
    <html>
      <body>
        <h1>
          <xsl:value-of select="."/>
        </h1>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>
We'll talk about these elements and what they do in just a minute. Keep in mind that the stylesheet is itself an XML document, so we have to follow all of the document rules we discussed in the previous chapter.
To transform the XML document using the XSLT stylesheet, run this command:
java org.apache.xalan.xslt.Process -in greeting.xml -xsl greeting.xsl 
  -out greeting.html
This command transforms the document greeting.xml, using the templates found in the stylesheet greeting.xsl. The results of the transformation are written to the file greeting.html. Check the output file in your favorite browser to make sure the transform worked correctly.
The XSLT processor generates these results:
<html>
<body>
<h1>
  Hello, World!
</h1>
</body>
</html>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How a Stylesheet Is Processed
Now that we're giddy with the excitement of having transformed an XML document, let's discuss the stylesheet and how it works. A big part of the XSLT learning curve is figuring out how stylesheets are processed. To make this clear, we'll go through the steps taken by the stylesheet processor to create the HTML document we want.
Before the XSLT processor can process your stylesheet, it has to read it. Conceptually, it doesn't matter how the XSLT processor stores the information from your stylesheet. For our purposes, we'll just assume that the XSLT processor can magically find anything it needs in our stylesheet. (If you really must know, Xalan uses an optimized table structure to represent the stylesheet; other processors may use that approach or something else.)
Our stylesheet contains three items: an <xsl:output> element that specifies HTML as the output format and two <xsl:template> elements that specify how parts of our XML document should be transformed.
Now that the XSLT processor has processed the stylesheet, it needs to read the document it's supposed to transform. The XSLT processor builds a tree view from the XML source. This tree view is what we'll keep in mind when we build our stylesheets.
Finally, we're ready to begin the actual work of transforming the XML document. The XSLT processor may set some properties based on your stylesheet (in the previous example, it would set its output method to HTML), then it begins processing as follows:
  • Do I have any nodes to process? The nodes to process are represented by something called the context. Initially the context is the root of the XML document, but it changes throughout the stylesheet. We'll talk about the context extensively in the next chapter. (Note: all XSLT processors enjoy being anthropomorphized, so I'll often refer to them this way.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Stylesheet Structure
As the final part of our introduction to XSLT, we'll look at the contents of the stylesheet itself. We'll explain all the things in our stylesheet and discuss other approaches we could have taken.
The <xsl:stylesheet> element is typically the root element of an XSLT stylesheet.
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="1.0">
First of all, the <xsl:stylesheet> element defines the version of XSLT we're using, along with a definition of the xsl namespace. To be compliant with the XSLT specification, your stylesheet should always begin with this element, coded exactly as shown here. Some stylesheet processors, notably Xalan, issue a warning message if your <xsl:stylesheet> element doesn't have these two attributes with these two values. For all examples in this book, we'll start the stylesheet with this exact element, defining other namespaces as needed.
Next, we specify the output method. The XSLT specification defines three output methods: xml, html, and text. We're creating an HTML document, so HTML is the output method we want to use. In addition to these three methods, an XSLT processor is free to define its own output methods, so check your XSLT processor's documentation to see if you have any other options.
<xsl:output method="html"/>
A variety of attributes are used with the different output methods. For example, if you're using method="xml", you can use doctype-public and doctype-system to define the public and system identifiers to be used in the the document type declaration. If you're using method="xml" or method="html", you can use the indent attribute to control whether or not the output document is indented. The discussion of the <xsl:output> element in Appendix A has all the details.
Our first template matches "/", the XPath expression for the document's root element.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sample Gallery
Before we get into more advanced topics, we'll transform our Hello World document in other ways. We'll look through simple stylesheets that convert our small XML document into the following things:
  • A Scalable Vector Graphics (SVG) File
  • A PDF file
  • A Java program
  • A Virtual Reality Modeling Language (VRML) file
Our first example will convert our Hello World document into an SVG file:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml"
    doctype-public="-//W3C//DTD SVG 20001102//EN"
    doctype-system=
      "http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd"/>

  <xsl:template match="/">
    <svg width="8cm" height="4cm">
      <g>
        <defs>
          <radialGradient id="MyGradient"
            cx="4cm" cy="2cm" r="3cm" fx="4cm" fy="2cm">
            <stop offset="0%" style="stop-color:red"/>
            <stop offset="50%" style="stop-color:blue"/>
            <stop offset="100%" style="stop-color:red"/>
          </radialGradient>
        </defs>
        <rect style="fill:url(#MyGradient); stroke:black"
          x="1cm" y="1cm" width="6cm" height="2cm"/>
        <text x="4cm" y="2.2cm" text-anchor="middle" 
          style="font-family:Verdana; font-size:24; 
          font-weight:bold; fill:black">
          <xsl:apply-templates select="greeting"/>
        </text>
      </g>
    </svg>
  </xsl:template>

  <xsl:template match="greeting">
    <xsl:value-of select="."/>
  </xsl:template>

</xsl:stylesheet>
As you can see from this stylesheet, most of the code here simply sets up the structure of the SVG document. This is typical of many stylesheets; once you learn what the output format should be, you merely extract content from the XML source document and insert it into the output document at the correct spot. When we transform the Hello World document with this stylesheet, here are the results:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
Although our stylesheets here are trivial, they are much simpler than the corresponding procedural code (written in Visual Basic, C++, Java, etc.) to transform any <greeting> elements similarly. We've gone over the basics of what stylesheets are and how they work.
As we go through this book, we'll demonstrate the incredible range of things you can do in XSLT stylesheets, including:
  • Using logic, branching, and control statements
  • Sorting and grouping elements
  • Linking and cross-referencing elements
  • Creating master documents that embed other XML documents, then sort, filter, group, and format the combined documents.
  • Adding new functions to the XSLT stylesheet processor with XSLT's extension mechanism
XSLT has an extremely active user community. To see just how active, visit the XSL-List site at http://www.mulberrytech.com/xsl/xsl-list/index.html.
Before we dive in to those topics, we need to talk about XPath, the syntax that describes what parts of an XML document we want to transform into all of these different things.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: XPath: A Syntax for Describing Needles and Haystacks
XPath is a syntax used to describe parts of an XML document. With XPath, you can refer to the first <para> element, the quantity attribute of the <part-number> element, all <first-name> elements that contain the text "Joe", and many other variations. An XSLT stylesheet uses XPath expressions in the match and select attributes of various elements to indicate how a document should be transformed. In this chapter, we'll discuss XPath in all its glory.
XPath is designed to be used inside an attribute in an XML document. The syntax is a mix of basic programming language expressions (such as $x*6) and Unix-like path expressions (such as /sonnet/author/last-name). In addition to the basic syntax, XPath provides a set of useful functions that allow you to find out various things about the document.
One important point, though: XPath works with the parsed version of your XML document. That means that some details of the original document aren't accessible to you from XPath. For example, entity references are resolved beforehand by the XSLT processor before instructions in our stylesheet are evaluated. CDATA sections are converted to text, as well. That means we have no way of knowing if a text node in an XPath tree was in the original XML document as text, as an entity reference, or as part of a CDATA section. As you get used to thinking about your XML documents in terms of XPath expressions, this situation won't be a problem, but it may confuse you at first.
XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model (DOM) tree, so if you're familiar with the DOM, you should have some understanding of how to build basic XPath expressions. (To be precise, this is a conceptual tree; an XSLT processor or anything else that implements the XPath standard doesn't have to build an actual tree.) There are seven kinds of nodes in XPath:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The XPath Data Model
XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model (DOM) tree, so if you're familiar with the DOM, you should have some understanding of how to build basic XPath expressions. (To be precise, this is a conceptual tree; an XSLT processor or anything else that implements the XPath standard doesn't have to build an actual tree.) There are seven kinds of nodes in XPath:
  • The root node (one per document)
  • Element nodes
  • Attribute nodes
  • Text nodes
  • Comment nodes
  • Processing instruction nodes
  • Namespace nodes
We'll talk about all the different node types in terms of the following document:
<?xml version="1.0"?>
<?xml-stylesheet href="sonnet.xsl" type="text/xsl"?>
<?cocoon-process type="xslt"?>

<!DOCTYPE sonnet [
  <!ELEMENT sonnet (auth:author, title, lines)>
  <!ATTLIST sonnet public-domain CDATA "yes"
            type (Shakespearean | Petrarchan) "Shakespearean">
<!ELEMENT auth:author  (last-name,first-name,nationality,
                        year-of-birth?,year-of-death?)>
<!ELEMENT last-name (#PCDATA)>
<!ELEMENT first-name (#PCDATA)>
<!ELEMENT nationality (#PCDATA)>
<!ELEMENT year-of-birth (#PCDATA)>
<!ELEMENT year-of-death (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT lines (line,line,line,line,
                 line,line,line,line,
                 line,line,line,line,
                 line,line)>
<!ELEMENT line (#PCDATA)>
]>

<!-- Default sonnet type is Shakespearean, the other allowable  -->
<!-- type is "Petrarchan."                                      -->
<sonnet type="Shakespearean">
  <auth:author xmlns:auth="http://www.authors.com/">
    <last-name>Shakespeare</last-name>
    <first-name>William</first-name>
    <nationality>British</nationality>
    <year-of-birth>1564</year-of-birth>
    <year-of-death>1616</year-of-death>
  </auth:author>
  <!-- Is there an official title for this sonnet?  They're     
       sometimes named after the first line.                   -->
  <title>Sonnet 130</title>
  <lines>
    <line>My mistress' eyes are nothing like the sun,</line>
    <line>Coral is far more red than her lips red.</line>
    <line>If snow be white, why then her breasts are dun,</line>
    <line>If hairs be wires, black wires grow on her head.</line>
    <line>I have seen roses damasked, red and white,</line>
    <line>But no such roses see I in her cheeks.</line>
    <line>And in some perfumes is there more delight</line>
    <line>Than in the breath that from my mistress reeks.</line>
    <line>I love to hear her speak, yet well I know</line>
    <line>That music hath a far more pleasing sound.</line>
    <line>I grant I never saw a goddess go,</line>
    <line>My mistress when she walks, treads on the ground.</line>
    <line>And yet, by Heaven, I think my love as rare</line>
    <line>As any she belied with false compare.</line>
  </lines>
</sonnet>
<!-- The title of Sting's 1987 album "Nothing like the sun" is  -->
<!-- from line 1 of this sonnet.                                -->
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Location Paths
One of the most common uses of XPath is to create location paths. A location path describes the location of something in an XML document. In our examples in the previous chapter, we used location paths on the match and select attributes of various XSLT elements. Those location paths described the parts of the XML document we wanted to work with. Most of the XPath expressions you'll use are location paths, and most of them are pretty simple. Before we dive in to the wonders of XPath, we need to discuss the context.
One of the most important concepts in XPath is the context. Everything we do in XPath is interpreted with respect to the context. You can think of an XML document as a hierarchy of directories in a filesystem. In our sonnet example, we could imagine that sonnet is a directory at the root level of the filesystem. The sonnet directory would, in turn, contain directories named auth:author, title, and lines. In this example, the context would be the current directory. If I go to a command line and execute a particular command (such as dir *.js), the results I get vary depending on the current directory. Similarly, the results of evaluating an XPath expression will probably vary based on the context.
Most of the time, we can think of the context as the node in the tree from which any expression is evaluated. To be completely accurate, the context consists of five things:
  • The context node (the "current directory"). The XPath expression is evaluated from this node.
  • Two integers, the context position and the context size. These integers are important when we're processing a group of nodes. For example, we could write an XPath expression that selects all of the <li> elements in a given document. The context size refers to the number of <li> items selected by that expression, and the context position refers to the position of the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Attribute Value Templates
Although they're technically defined in the XSLT specification (in section 7.6.2, to be exact), we'll discuss attribute value templates here. An attribute value template is an XPath expression that is evaluated, and the result of that evaluation replaces the attribute value template. For example, we could create an HTML <table> element like this:
<table border="{@size}"/>
In this example, the XPath expression @size is evaluated, and its value, whatever that happens to be, is inserted into the output tree as the value of the border attribute. Attribute value templates can be used in any literal result elements in your stylesheet (for HTML elements and other things that aren't part of the XSLT namespace, for example). You can also use attribute value templates in the following XSLT attributes:
  • The name and namespace attributes of the <xsl:attribute> element
  • The name and namespace attributes of the <xsl:element> element
  • The format, lang, letter-value, grouping-separator, and grouping-size attributes of the <xsl:number> element
  • The name attribute of the <xsl:processing-instruction> element
  • The lang, data-type, order, and case-order attributes of the <xsl:sort> element
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Datatypes
An XPath expression returns one of four datatypes:
node-set
Represents a set of nodes. The set can be empty, or it can contain any number of nodes.
boolean
Represents the value true or false. Be aware that the true or false strings have no special meaning or value in XPath; see Section 4.2.1.2 in Chapter 4 for a more detailed discussion of boolean values.
number
Represents a floating-point number. All numbers in XPath and XSLT are implemented as floating-point numbers; the integer (or int) datatype does not exist in XPath and XSLT. Specifically, all numbers are implemented as IEEE 754 floating-point numbers, the same standard used by the Java float and double primitive types. In addition to ordinary numbers, there are five special values for numbers: positive and negative infinity, positive and negative zero, and NaN, the special symbol for anything that is not a number.
string
Represents zero or more characters, as defined in the XML specification.
These datatypes are usually simple, and with the exception of node-sets, converting between types is usually straightforward. We won't discuss these datatypes in any more detail here; instead, we'll discuss datatypes and conversions as we need them to do specific tasks.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The XPath View of an XML Document
Before we leave the subject of XPath, we'll look at a stylesheet that generates a pictorial view of a document. The stylesheet has to distinguish between all of the different XPath node types, including any rarely used namespace nodes.
Figure 3-1 shows the output of our stylesheet. In this graphical view of the document, the nested HTML tables illustrate which nodes are contained inside of others, as well as the sequence in which these nodes occur in the original document. In the section of the document visible in Figure 3-1, the root of the document contains, in order, two processing instructions and two comments, followed by the <sonnet> element. The <sonnet> element, in turn, contains two attributes and an <auth:author> element. The <auth:author> element contains a namespace node and an element. Be aware that this stylesheet has its limitations; if you throw a very large XML document at it, it will generate an HTML file with many levels of nested tables—probably more levels than your browser can handle.
Figure 3-1: XPath tree view of an XML document
Now we'll take a look at the stylesheet and how it works. The stylesheet creates a number of nested tables to illustrate the XPath view of the document. We begin by writing the basic HTML elements to the output stream and creating a legend for our nested tree view:
  <xsl:template match="/">
    <html>
      <head>
        <title>XPath view of your document</title>
        <style type="text/css">
          <xsl:comment>
            span.literal         { font-family: Courier, monospace; }
          </xsl:comment>
        </style>
      </head>
      <body>
        <h1>XPath view of your document</h1>
        <p>The structure of your document (as defined by 
           the XPath standard) is outlined below.</p>
        <table cellspacing="5" cellpadding="2" border="0">
          <tr>
            <td colspan="7">
              <b>Node types:</b>
            </td>
          </tr>
          <tr>
            <td bgcolor="#99CCCC"><b>root</b></td>
            <td bgcolor="#CCCC99"><b>element</b></td>
            <td bgcolor="#FFFF99"><b>attribute</b></td>
            <td bgcolor="#FFCC99"><b>text</b></td>
            <td bgcolor="#CCCCFF"><b>comment</b></td>
            <td bgcolor="#99FF99"><b>processing instruction</b></td>
            <td bgcolor="#CC99CC"><b>namespace</b></td>
          </tr>
        </table>
        <br />
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
We've covered the basics of XPath. Hopefully, at this point you're comfortable with the idea of writing XPath expressions to describe parts of an XML document. As we go through the following chapters, you'll see XPath expressions used in a variety of ways, all of which build on the basics we've discussed here. You'll probably spend most of your debugging time working on the XPath expressions in your stylesheets. Very few of the things we'll do in the rest of the book are possible without precise XPath expressions.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Branching and Control Elements
So far, we've done some straightforward transformations and we've been able to do some reasonably sophisticated things. To do truly useful work, though, we'll need to use logic in our stylesheets. In this chapter, we'll discuss the XSLT elements that allow you to do just that. Although you'll see several XML elements that look like constructs from other programming languages, they're not exactly the same. As we go along, we'll discuss what makes XSLT different and how to do common tasks with your stylesheets.
By the end of this chapter, you should:
  • Know the XSLT elements used for branching and control
  • Understand the differences between XSLT's branching elements and similar constructs in other programming languages
  • Know how to invoke XSLT templates by name and how to pass parameters to them, if you want
  • Know how to use XSLT variables
  • Understand how to use recursion to get around the "limitations" of XSLT's branching and control elements
Three XSLT elements are used for branching: <xsl:if>, <xsl:choose>, and <xsl:for-each>. The first two are much like the if and case statements you may be familiar with from other languages, while the for-each element is significantly different from the for or do-while structures in other languages. We'll discuss all of them here.
The <xsl:if> element looks like this:
<xsl:if test="count(zone) &gt; 2">
  <xsl:text>Applicable zones: </xsl:text>
  <xsl:apply-templates select="zone"/>
</xsl:if>
The <xsl:if> element, surprisingly enough, implements an if statement. The element has only one attribute, test. If the value of test evaluates to the boolean value true, then all elements inside the <xsl:if> are processed. If
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Goals of This Chapter
By the end of this chapter, you should:
  • Know the XSLT elements used for branching and control
  • Understand the differences between XSLT's branching elements and similar constructs in other programming languages
  • Know how to invoke XSLT templates by name and how to pass parameters to them, if you want
  • Know how to use XSLT variables
  • Understand how to use recursion to get around the "limitations" of XSLT's branching and control elements
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Branching Elements of XSLT
Three XSLT elements are used for branching: <xsl:if>, <xsl:choose>, and <xsl:for-each>. The first two are much like the if and case statements you may be familiar with from other languages, while the for-each element is significantly different from the for or do-while structures in other languages. We'll discuss all of them here.
The <xsl:if> element looks like this:
<xsl:if test="count(zone) &gt; 2">
  <xsl:text>Applicable zones: </xsl:text>
  <xsl:apply-templates select="zone"/>
</xsl:if>
The <xsl:if> element, surprisingly enough, implements an if statement. The element has only one attribute, test. If the value of test evaluates to the boolean value true, then all elements inside the <xsl:if> are processed. If test evaluates to false, then the contents of the <xsl:if> element are ignored. (If you want to implement an if-then-else statement, check out the <xsl:choose> element described in the next section.)
Notice that we used &gt; instead of > in the attribute value. You're always safe using &gt; here, although some XSLT processors process the greater-than sign correctly if you use > instead. If you need to use the less-than operator (<), you'll have to use the &lt; entity. The same holds true for the less-than-or-equal operator (<=) and the greater-than-or-equal (>=) operators. See Section B.4.2 for more information on this topic.

Section 4.2.1.1: Converting to boolean values

The <xsl:if> element is pretty simple, but it's the first time we've had to deal with boolean values. These values will come up later, so we might as well discuss them here. Attributes like the test attribute of the <xsl:if> element convert whatever their values happen to be into a boolean value. If that boolean value is
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Invoking Templates by Name
Up to this point, we've always used XSLT's <xsl:apply-templates> element to invoke other templates. You can think of this as a limited form of polymorphism; a single instruction is invoked a number of times, and the XSLT processor uses each node in the node-set to determine which <xsl:template> to invoke. Most of the time, this is what we want. However, sometimes we want to invoke a particular template. XSLT allows us to do this with the <xsl:call-template> element.
To invoke a template by name, two things have to happen:
  • The template you want to invoke has to have a name.
  • You use the <xsl:call-template> element to invoke the named template.
Here's how to do this. Say we have a template named createMasthead that creates the masthead of a web page. Whenever we create an HTML page for our web site, we want to invoke the createMasthead template to create the masthead. Here's what our stylesheet would look like:
<xsl:template name="createMasthead">
  <!-- interesting stuff that generates the masthead goes here -->
</xsl:template>
...
<xsl:template match="/">
  <html>
    <head>
      <title><xsl:value-of select="title"/></title>
    </head>
    <body>
      <xsl:call-template name="createMasthead"/>
...
Named templates are extremely useful for defining commonly used markup. For example, say you're using an XSLT stylesheet to create web pages with a particular look and feel. You can write named templates that create the header, footer, navigation areas, or other items that define how your web page will look. Every time you need to create a web page, simply use <xsl:call-template> to invoke those templates and create the look and feel you want.
Even better, if you put those named templates in a separate stylesheet and import the stylesheet (with either <xsl:import> or <xsl:include>), you can create a set of stylesheets that generate the look and feel of the web site you want. If you decide to redesign your web site, redesign the stylesheets that define the common graphical and layout elements. Change those stylesheets, regenerate your web site, and voila! You will see an instantly updated web site. (See Chapter 9 for an example.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parameters
The XSLT <xsl:param> and <xsl:with-param> elements allow you to pass parameters to a template. You can pass templates with either the <call-template> element or the <apply-templates> element; we'll discuss the details in this section.
To define a parameter in a template, use the <xsl:param> element. Here's an example of a template that defines two parameters:
<xsl:template name="calcuateArea">
  <xsl:param name="width"/>
  <xsl:param name="height"/>

  <xsl:value-of select="$width * $height"/>
</xsl:template>
Conceptually, this is a lot like writing code in a traditional programming language, isn't it? Our template here defines two parameters, width and height, and outputs their product.
If you want, you can define a default value for a parameter. There are two ways to define a default value; the simplest is to use a select attribute on the <xsl:param> element:
<template name="addTableCell">
  <xsl:param name="bgColor" select="'blue'"/>
  <xsl:param name="width" select="150"/>
  <xsl:param name="content"/>
  <td width="{$width}" bgcolor="{$bgColor}">
    <xsl:apply-templates select="$content"/>
  </td>
</template>
In this example, the default values of the parameters bgColor and width are 'blue' and 150, respectively. If we invoke this template without specifying values for these parameters, the default values are used. Also notice that we generated the values of the width and bgcolor attributes of the HTML <td> tag with attribute value templates, the values in curly braces. For more information, see Section 3.3 in Chapter 3.
Notice that in the previous sample, we put single quotes around the value blue, but we didn't do it around the value 150. Without the single quotes around blue, the XSLT processor assumes we want to select all the <blue> elements in the current context, which is probably not what we want. The XSLT processor is clever enough to realize that the value
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Variables
If we use logic to control the flow of our stylesheets, we'll probably want to store temporary results along the way. In other words, we'll need to use variables. XSLT provides the <xsl:variable> element, which allows you to store a value and associate it with a name.
The <xsl:variable> element can be used in three ways. The simplest form of the element creates a new variable whose value is an empty string (""). Here's how it looks:
<xsl:variable name="x"/>
This element creates a new variable named x, whose value is an empty string. (Please hold your applause until the end of the section.)
You can also create a variable by adding a select attribute to the <xsl:variable> element:
<xsl:variable name="favouriteColour" select="'blue'"/>
In this case, we've set the value of the variable to be the string "blue". Notice that we put single quotes around the value. These quotes ensure that the literal value blue is used as the value of the variable. If we had left out the single quotes, this would mean the value of the variable is that of all the <blue> elements in the current context, which definitely isn't what we want here.
Some XSLT processors don't require you to put single quotes around a literal value if the literal value begins with a number. This is because the XML specification states that XML element names can't begin with a number. If I say the value should be 35, Xalan, XT, and Saxon all assume that I mean 35 as a literal value, not as an element name. Although this works with many XSLT processors, you're safer to put the single quotes around the numeric values anyway. A further aside: the value here is the string "35", although it can be converted to a number easily.
The third way to use the <xsl:variable> element is to put content inside it. Here's a brief example:
<xsl:variable name="y">
  <xsl:choose>
    <xsl:when test="$x &gt; 7">
      <xsl:text>13</xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:text>15</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
</xsl:variable>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Recursion to Do Most Anything
Writing an XSLT stylesheet is different from programming in other languages. If you didn't believe that before, you probably do now. We'll finish this chapter with a couple of examples that demonstrate how to use recursion to solve the kinds of problems that you're probably used to solving with procedural programming languages.
To demonstrate how to use recursion to solve problems, we'll write a string replace function. This is sometimes useful when you need to escape certain characters or substrings in your output. The stylesheet we'll develop here transforms an XML document into a set of SQL statements that will be executed at a Windows command prompt. We have to do several things:
Put a caret (^) in front of all ampersands (&)
On the Windows NT and Windows 2000 command prompt, the ampersand means that the current command has ended and another is beginning. For example, this command creates a new directory called xslt and changes the current directory to the newly created one:
mkdir xslt & chdir xslt
If we create a SQL statement that contains an ampersand, we'll need to escape the ampersand so it's processed as a literal character, not as an operator. If we insert the value Jones & Son as the value of the company field in a row of the database, we need to change it to Jones ^& Son before we try to run the SQL command.
Put a caret (^)) in front of all vertical bars (|)
The vertical bar is the pipe operator on Windows systems, so we need to escape it if we want it interpreted as literal text instead of an operator.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Stylesheet That Emulates a for Loop
We stressed earlier that the xsl:for-each element is not a for loop; it's merely an iterator across a group of nodes. However, if you simply must implement a for loop, there's a way to do it. (Get ready to use recursion, though.)
Our design here is to create a named template that will take some arguments, then act as a for loop processor. If you think about a traditional for loop, it has several properties:
  • One or more initialization statements. These statements are processed before the for loop begins. Typically the initialization statements refer to an index variable that is used to determine whether the loop should continue.
  • An increment statement. This statement specifies how the index variable should be updated after each pass through the loop.
  • A boolean expression. If the expression is true, the loop continues; if it is ever false, the loop exits.
Let's take a sample from the world of Java and C++:
for (int i=0; i<length; i++)
In this scintillating example, the initialization statement is i=0, the index variable (the variable whose value determines whether we're done or not) is i, the boolean expression we use to test whether the loop should continue is i<length, and the increment statement is i++.
For our purposes here, we're going to make several simplifying assumptions. (Feel free, dear reader, to make the example as complicated as you wish.) Here are the shortcuts we'll take:
  • Rather than use an initialization statement, we'll require the caller to set the value of the local variable i when it invokes our for loop processor.
  • Rather than specify an increment statement such as i++, we'll require the caller to set the value of the local variable
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Stylesheet That Generates a Stylesheet That Emulates a for Loop
We've emulated a for loop now, but what about a stylesheet that generates another stylesheet that emulates the for loop? As we beat this dead horse one more time, we'll create a stylesheet that generates the iteration for us, along with an XML syntax that automates the process.
Here's the XML template we'll use to generate the stylesheet:
<?xml version="1.0"?>
<html xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <head>
    <title>Text generated by our for loop processor</title>
  </head>
  <body>
    <h1>Text generated by our for loop processor</h1>
    <table border="1">
      <tr>
        <th>Iteration #</th>
        <th>Value of <i>i</i></th>
      </tr>
      <for-loop index-variable="0" increment="1" 
       operator="<=" test-value="10">
        <tr>
          <td align="center">
            <xsl:value-of select="$iteration"/>
          </td>
          <td align="center">
            <xsl:value-of select="$i"/>
          </td>
        </tr>
      </for-loop>
    </table>
  </body>
</html>