Transforming Documents with XSL/XSLT

Earlier in this chapter, we used a Transformer object to copy a DOM representation of an example back to XML text. We mentioned that we were not really tapping the potential of the Transformer. Now, we’ll give you the full story.

The javax.xml.transform package is the API for using the XSL/XSLT transformation language. XSL stands for Extensible Stylesheet Language. Like Cascading Stylesheets (CSS) for HTML, XSL allows us to “mark up” XML documents by adding tags that provide presentation information. XSL Transformation (XSLT) takes this further by adding the ability to completely restructure the XML and produce arbitrary output. XSL and XSLT together make up their own programming language for processing an XML document as input and producing another (usually XML) document as output. (From here on in, we’ll refer to them collectively as XSL.)

XSL is extremely powerful, and new applications for its use arise every day. For example, consider a website that is frequently updated and that must provide access to a variety of mobile devices and traditional browsers. Rather than recreating the site for these and additional platforms, XSL can transform the content to an appropriate format for each platform. More generally, rendering content from XML is simply a better way to preserve your data and keep it separate from your presentation information. XSL can be used to render an entire website in different styles from files containing “pure data” in XML, much like a database. Multilingual sites also benefit from XSL to lay out text in different ways for different audiences.

You can probably guess the caveat that we’re going to issue: XSL is a big topic worthy of its own books (see, for example, O’Reilly’s Java and XSLT by Eric Burke), and we can only give you a taste of it here. Furthermore, some people find XSL difficult to understand at first glance because it requires thinking in terms of recursively processing document tags. In recent years, much of the impetus behind XSL as a way to produce web-based content has fallen away in favor of using more JavaScript on the client. However, XSL remains a powerful way to transform XML and is widely used in other document-oriented applications.

XSL Basics

XSL is an XML-based standard, so it should come as no surprise that the language is based on XML. An XSL stylesheet is an XML document using special tags defined by the XSL namespace to describe the transformation. The most basic XSL operations involve matching parts of the input XML document and generating output based on their contents. One or more XSL templates live within the stylesheet and are called in response to tags appearing in the input. XSL is often used in a purely input-driven way, whereas input XML tags trigger output in the order in which they appear, using only the information they contain. But more generally, the output can be constructed from arbitrary parts of the input, drawing from it like a database, composing elements and attributes. The XSLT transformation part of XSL adds things like conditionals and iteration to this mix, which enable any kind of output to be generated based on the input.

An XSL stylesheet contains a stylesheet tag as its root element. By convention, the stylesheet defines a namespace prefix xsl for the XSL namespace. Within the stylesheet, are one or more template tags contain a match attribute that describes the element upon which they operate.

<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

   <xsl:template match="/">
     I found the root of the document!
   </xsl:template>

</xsl:stylesheet>

When a template matches an element, it has an opportunity to handle all the children of the element. The simple stylesheet shown here has one template that matches the root of the input document and simply outputs some plain text. By default, input not matched is simply copied to the output with its tags stripped (HTML convention). But here we match the root so we consume the entire input and nothing but our message appears on the output.

The match attribute can refer to elements using the XPath notation that we described earlier. This is a hierarchical path starting with the root element. For example, match="/inventory/animal" would match only the animal elements from our zooinventory.xml file. In XSL, the path may be absolute (starting with “/”) or relative, in which case, the template detects whenever that element appears in any subcontext (equivalent to “//” in XPath).

Within the template, we can put whatever we want as long as it is well-formed XML (if not, we can use a CDATA section or XInclude). But the real power comes when we use parts of the input to generate output. The XSL value-of tag is used to output the content or child of the element. For example, the following template would match an animal element and output the value of its Name child element:

<xsl:template match="animal">
   Name: <xsl:value-of select="name"/>
</xsl:template>

The select attribute uses an XPath expression relative to the current node. In this case, we tell it to print the value of the name element within animal. We could have used a relative path to a more deeply nested element within animal or even an absolute path to another part of the document. To refer to the “current” element (in this case, the animal element itself), a select expression can use “.” as the path. The select expression can also retrieve attributes from the elements that it references.

If we try to add the animal template to our simple example, it won’t generate any output. What’s the problem? If you recall, we said that a template matching an element has the opportunity to process all its children. We already have a template matching the root (“/”), so it is consuming all the input. The answer to our dilemma—and this is where things get a little tricky—is to delegate the matching to other templates using the apply-templates tag. The following example correctly prints the names of all the animals in our document:

<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/
   Transform" version="1.0">

   <xsl:template match="/">
      Found the root!
      <xsl:apply-templates/>
   </xsl:template>

   <xsl:template match="animal">
      Name: <xsl:value-of select="name"/>
   </xsl:template>

</xsl:stylesheet>

We still have the opportunity to add output before and after the apply-templates tag. But upon invoking it, the template matching continues from the current node. Next, we’ll use what we have so far and add a few bells and whistles.

Transforming the Zoo Inventory

Your boss just called, and it’s now imperative that your zoo clients have access to the zoo inventory through the Web, today! After reading Chapter 15, you should be thoroughly prepared to build a nice “zoo app.” Let’s get started by creating an XSL stylesheet to turn our zooinventory.xml into HTML:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="inventory">
  <xs:complexType>
    <xs:sequence>
       <xs:element maxOccurs="unbounded" ref="animal"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="name" type="xs:string"/>

<xs:element name="animal">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element name="species" type="xs:string"/>
      <xs:element name="habitat" type="xs:string"/>
      <xs:choice>
         <xs:element name="food" type="xs:string"/>
         <xs:element ref="foodRecipe"/>
      </xs:choice>
      <xs:element name="temperament" type="xs:string"/>
      <xs:element name="weight" type="xs:double"/>
    </xs:sequence>
    <xs:attribute name="animalClass" default="unknown">
      <xs:simpleType>
        <xs:restriction base="xs:token">
          <xs:enumeration value="unknown"/>
          <xs:enumeration value="mammal"/>
          <xs:enumeration value="reptile"/>
          <xs:enumeration value="bird"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

<xs:element name="foodRecipe">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element maxOccurs="unbounded" name="ingredient" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

</xs:schema>

The stylesheet contains three templates. The first matches /inventory and outputs the beginning of our HTML document (the header) along with the start of a table for the animals. It then delegates using apply-templates before closing the table and adding the HTML footer. The next template matches inventory/animal, printing one row of an HTML table for each animal. Although there are no other animal elements in the document, it still doesn’t hurt to specify that we will match an animal only in the context of an inventory, because, in this case, we are relying on inventory to start and end our table. (This template makes sense only in the context of an inventory.) Finally, we provide a template that matches foodRecipe and prints a small, nested table for that information. foodRecipe makes use of the "for-each" operation to loop over child nodes with a select specifying that we are only interested in ingredient children. For each ingredient, we output its value in a row.

There is one more thing to note in the animal template. Our apply-templates element has a select attribute that limits the elements affected. In this case, we are using the "|" regular expression-like syntax to say that we want to apply templates for only the foodorfoodRecipe child elements. Why do we do this? Because we didn’t match the root of the document (only inventory), we still have the default stylesheet behavior of outputting the plain text of nodes that aren’t matched anywhere else. We take advantage of this behavior to print the text of the food element. But we don’t want to output the text of all of the other elements of animal that we’ve already printed explicitly, so we process only the food and foodRecipe elements. Alternatively, we could have been more verbose, adding a template matching the root and another template just for the food element. That would also mean that new tags added to our XML would, by default, be ignored and not change the output. This may or may not be the behavior you want, and there are other options as well. As with all powerful tools, there is usually more than one way to do something.

XSLTransform

Now that we have a stylesheet, let’s apply it! The following simple program, XSLTransform, uses the javax.xml.transform package to apply the stylesheet to an XML document and print the result. You can use it to experiment with XSL and our example code.

    import javax.xml.transform.*;
    import javax.xml.transform.stream.*;
    
    public class XSLTransform 
    {
        public static void main( String [] args ) throws Exception {
            if ( args.length < 2 || !args[0].endsWith(".xsl") ) {
                System.err.println("usage: XSLTransform file.xsl file.xml");
                System.exit(1);
            }
            String xslFile = args[0], xmlFile = args[1];
    
            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer transformer = 
                factory.newTransformer( new StreamSource( xslFile ) );
            StreamSource xmlsource = new StreamSource( xmlFile );
            StreamResult output = new StreamResult( System.out );
            transformer.transform( xmlsource, output );
        }
    }

Run XSLTransform, passing the XSL stylesheet and XML input, as in the following command:

% java XSLTransform zooinventory.xsl zooinventory.xml > zooinventory.html

The output should look like Figure 24-2.

Image of the zoo inventory table

Figure 24-2. Image of the zoo inventory table

Constructing the transform is a similar process to that of getting a SAX or DOM parser. The difference from our earlier use of the TransformerFactory is that this time, we construct the transformer, passing it the XSL stylesheet source. The resulting Transformer object is then a dedicated machine that knows how to take input XML and generate output according to its rules.

One important thing to note about XSLTransform is that it is not guaranteed thread-safe. In our example, we run the transform only once. If you are planning to run the same transform many times, you should take the additional step of getting a Templates object for the transform first, then using it to create Transformers.

Templates templates =
    factory.newTemplates( new StreamSource( args[0] ) );
Transformer transformer = templates.newTransformer();

The Templates object holds the parsed representation of the stylesheet in a compiled form and makes the process of getting a new Transformer much faster. The transformers themselves may also be more highly optimized in this case. The XSL transformer actually generates bytecode for very efficient “translets” that implement the transform. This means that instead of the transformer reading a description of what to do with your XML, it actually produces a small compiled program to execute the instructions!

XSL in the Browser

With our XSLTransform example, you can see how you’d go about rendering XML to an HTML document on the server side. But as mentioned in the introduction, modern web browsers support XSL on the client side as well. Browsers can automatically download an XSL stylesheet and use it to transform an XML document. To make this happen, just add a standard XSL stylesheet reference in your XML. You can put the stylesheet directive next to your DOCTYPE declaration in the zooinventory.xml file:

<?xml-stylesheet type="text/xsl" href="zooinventory.xsl"?>

As long as the zooinventory.xsl file is available at the same location (base URL) as the zooinventory.xml file, the browser will use it to render HTML on the client side.

Get Learning Java, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.