BUY THIS BOOK
Add to Cart

Print Book $39.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint this content?


Java and XSLT
Java and XSLT By Eric M. Burke
September 2001
Pages: 528

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
When XML first appeared, people widely believed that it was the imminent successor to HTML. This viewpoint was influenced by a variety of factors, including media hype, wishful thinking, and simple confusion about the number of new technologies associated with XML. The reality is that millions of web sites are written in HTML, and no widely used browser fully supports XML and its related standards. Even when browser vendors incorporate full support for XML and its family of related technologies, it will take years before enough people use these new versions to justify rewriting most web sites in XML. Although maintaining compatibility with older browsers is essential, companies should not hesitate to move forward with XML and related technologies on the server.
From the browser perspective, HTML will remain dominant on the Web for many years to come. Looking beneath the hood will reveal a much different picture, however, in which HTML is used only during the last instant of presentation. Web applications must support a multitude of browsers, and the easiest way to do this is to simply transform data into HTML before sending it to the client. On the server side, XML is the preferred way to process and exchange data because it is portable, standard, and easy to work with. This is where Java and XSLT enter the picture.
Extensible Stylesheet Language Transformations (XSLT) is designed to transform XML data into some other form, most commonly HTML, XHTML, or another XML format. An XSLT processor , such as Apache's Xalan, performs transformations using one or more XSLT stylesheets , which are also XML documents. As Figure 1-1 illustrates, XSLT can be utilized on the web tier while web browsers on the client tier deal only with HTML.
Figure 1-1: XSLT transformation
Typically in an XSLT- and Java-based web application, XML data is generated dynamically based on database queries. Although some newer databases can export data directly as XML, you will often write custom Java code to extract data using JDBC and convert it to XML. This XML data, such as a customized list of benefit elections or perhaps an airline schedule for a specific time window, may be different for each client using the application. In order to display this XML data on most browsers, it must first be converted to HTML. As Figure 1-1 shows, the XML data is fed into the processor as one input, and an XSLT stylesheet is provided as a second input. The output is then sent directly to the web browser as a stream of HTML. The XSLT stylesheet produces HTML formatting instructions, while the XML provides raw data.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Java, XSLT, and the Web
Extensible Stylesheet Language Transformations (XSLT) is designed to transform XML data into some other form, most commonly HTML, XHTML, or another XML format. An XSLT processor , such as Apache's Xalan, performs transformations using one or more XSLT stylesheets , which are also XML documents. As Figure 1-1 illustrates, XSLT can be utilized on the web tier while web browsers on the client tier deal only with HTML.
Figure 1-1: XSLT transformation
Typically in an XSLT- and Java-based web application, XML data is generated dynamically based on database queries. Although some newer databases can export data directly as XML, you will often write custom Java code to extract data using JDBC and convert it to XML. This XML data, such as a customized list of benefit elections or perhaps an airline schedule for a specific time window, may be different for each client using the application. In order to display this XML data on most browsers, it must first be converted to HTML. As Figure 1-1 shows, the XML data is fed into the processor as one input, and an XSLT stylesheet is provided as a second input. The output is then sent directly to the web browser as a stream of HTML. The XSLT stylesheet produces HTML formatting instructions, while the XML provides raw data.
One of the fundamental problems with HTML is its haphazard implementation. Although the specification for HTML is available from the World Wide Web Consortium (W3C), its evolution was driven mostly by competition between Netscape and Microsoft rather than a thoughtful design process and open standards. This resulted in a bloated language littered with browser-specific tags and varying support for standards. Since no two browsers support the exact same set of HTML features, web authors often limit themselves to a subset of HTML. Another approach is to create and maintain separate copies of each web page, which take advantage of the unique features found in a particular browser. The limitations of HTML are compounded for dynamic sites, in which Java programs are often responsible for accessing enterprise data sources and presenting that information through the browser.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XML Review
In a nutshell, XML is a format for storing structured data. Although it looks a lot like HTML, XML is much more strict with quotes, properly terminated tags, and other such details. XML does not define tag names, so document authors must invent their own set of tags or look towards a standards organization that defines a suitable XML markup language . A markup language is essentially a set of custom tags with semantic meaning behind each tag; XSLT is one such markup language, since it is expressed using XML syntax.
The terms element and tag are often used interchangeably, and both are used in this book. Speaking from a more technical viewpoint, element refers to the concept being modeled, while tag refers to the actual markup that appears in the XML document. So <account> is a tag that represents an account element in a computer program.
Standard Generalized Markup Language (SGML) forms the basis for HTML, XHTML, XML, and XSLT, but in very different ways for each. Figure 1-2 illustrates the relationships between these technologies.
Figure 1-2: SGML heritage
SGML is a very sophisticated metalanguage designed for large and complex documentation. As a metalanguage, it defines syntax rules for tags but does not define any specific tags. HTML, on the other hand, is a specific markup language implemented using SGML. A markup language defines its own set of tags, such as <h1> and <p>. Because HTML is a markup language instead of a metalanguage, you cannot add new tags and are at the mercy of the browser vendor to properly implement those tags.
XML, as shown in Figure 1-2, is a subset of SGML. XML documents are compatible with SGML documents, however XML is a much smaller language. A key goal of XML is simplicity, since it has to work well on the Web where bandwidth and limited client processing power is a concern. Because of its simplicity, XML is easier to parse and validate, making it a better performer than SGML. XML is also a metalanguage, which explains why XML does not define any tags of its own. XSLT is a particular markup language implemented using XML, and will be covered in detail in the next two chapters.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Beyond Dynamic Web Pages
You probably know a little bit about servlets already. Essentially, they are Java classes that run on the web tier, offering a high-performance, portable alternative to CGI scripts. Java servlets are great for extracting data from a database and then generating XHTML for the browser. They are also good for validating HTTP POST or GET requests from browsers, allowing people to fill out job applications or order books online. But more powerful techniques are required when you create web applications instead of simple web sites.
When compared to GUI applications based on Swing or AWT, developing for the Web can be much more difficult. Most of the difficulties you will encounter can be traced to one of the following:
  • Hypertext Transfer Protocol (HTTP)
  • HTML limitations
  • browser compatibility problems
  • concurrency issues
HTTP is a fairly simple protocol that enables a client to communicate with a server. Web browsers almost always use HTTP to communicate with web servers, although they may use other protocols such as HTTPS for secure connections or even FTP for file downloads. HTTP is a request/response protocol, and the browser must initiate the request. Each time you click on a hyperlink, your browser issues a new request to a web server. The server processes the request and sends a response, thus finishing the exchange.
This request/response cycle is easy to understand but makes it tedious to develop an application that maintains state information as the user moves through a complex web application. For example, as a user adds items to a shopping cart, a servlet must store that data somewhere while waiting for the client to make another request. When that request arrives, the servlet has to associate the cart with that particular client, since the servlet could be dealing with hundreds or thousands of concurrent clients. Other than establishing a timeout period, the servlet has no idea when the client abandons the cart, deciding to shop on a competitor's site instead. The HTTP protocol makes it impossible for the server to initiate a conversation with the client, so the servlet cannot periodically ping the client as it can with a "normal" client/server application.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting Started
The best way to get started with new technologies is to experiment. For example, if you do not know XSLT, you should experiment with plenty of stylesheets as you work through the next two chapters. Aside from trying out the examples that appear in this book, you may want to invent a simple XML data file that represents something of interest to you, such as your personal music collection or family tree. Using XSLT stylesheets, try to create web pages that show your data in many different formats.
Once the basics of XSLT are out of the way, servlets will be your next big challenge. Although the servlet API is not particularly difficult to learn, configuration and deployment issues can make it difficult to debug and test your applications. The best advice is to start small, writing a very basic application that proves your environment is configured correctly before moving on to more sophisticated examples. Apache's Tomcat is probably the best servlet container for beginners because it is free, easy to configure, and is the official reference implementation for Sun's servlet API. A servlet container is the server that runs servlets. Chapter 6 covers the essentials of the servlet API, but for all the details you will want to pick up a copy of Java Servlet Programming by Jason Hunter (O'Reilly). You definitely want to get the second edition because it covers the dramatic changes that were introduced in Version 2.2 of the servlet API.
Although this book uses primarily Sun's JAXP and Apache's Xalan, many other XSLT processors are available. Processors based on other languages may offer much higher performance when invoked from the command line, primarily because they do not incur the overhead of a Java Virtual Machine (JVM) at application startup time. When using XSLT from a servlet, however, the JVM is already running, so startup time is no longer an issue. Pure Java processors are great for servlets because of the ease with which they can be embedded into the web application. Simply adding a JAR file to the CLASSPATH is generally all that must be done.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Web Browser Support for XSLT
In a web application environment, performing XSLT transformations on the client instead of the server is valuable for a number of reasons. Most importantly, it reduces the workload on the server machine, allowing a greater number of clients to be served. Once a stylesheet is downloaded to the client, subsequent requests will presumably use a cached copy, therefore only the raw XML data will need to be transmitted with each request. This has the potential to greatly reduce bandwidth requirements.
Even more interesting tricks are possible when JavaScript is introduced into the equation. You can programmatically modify either the XML data or the XSLT stylesheet on the client side, reapply the stylesheet, and see the results immediately without requesting a new document from the server.
Microsoft introduced XSLT support into Version 5.0 of Internet Explorer, but the XSLT specification was not finalized at the time. Unfortunately, significant changes were made to XSLT before it was finally promoted to a W3C Recommendation, but IE had already shipped using the older version of the specification. Although Microsoft has done a good job updating its MSXML parser with full support for the final XSLT Recommendation, millions of users will probably stick to IE 5.0 or 5.5 for quite some time, making it very difficult to perform portable XSLT transformations on the client. For IE 5.0 or 5.5 users, the MSXML parser is available as a separate download from Microsoft. Once downloaded, installed, and configured using a separate program called xmlinst, the browser will be compliant with Version 1.0 of the XSLT recommendation. This is something that developers will want to do, but probably very few end users will have the technical skills to go through these steps.
At the time of this writing, Netscape had not introduced support for XSLT into its browsers. We hope this changes by the time this book is published. Although their implementation will be released much later than Microsoft's, it should be compliant with the latest XSLT Recommendation.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: XSLT Part 1 -- The Basics
Extensible Stylesheet Language (XSL) is a specification from the World Wide Web Consortium (W3C) and is broken down into two complementary technologies: XSL Formatting Objects and XSL Transformations (XSLT). XSL Formatting Objects, a language for defining formatting such as fonts and page layout, is not covered in this book. XSLT, on the other hand, was primarily designed to transform a well-formed XML document into XSL Formatting Objects.
Even though XSLT was designed to support XSL Formatting Objects, it has emerged as the preferred technology for all sorts of transformations. Transformation from XML to HTML is the most common, but XSLT can also be used to transform well-formed XML into just about any text file format. This will give XML- and XSLT-based web sites a major leg up as wireless devices become more prevalent because XSLT can also be used to transform XML into Wireless Markup Language or some other stripped-down format that wireless devices will require.
Why is transformation so important? XML provides a simple syntax for defining markup, but it is up to individuals and organizations to define specific markup languages. There is no guarantee that two organizations will use the exact same markup; in fact, you may struggle to agree on consistent formats within the same group or company. One group may use <employee>, while others may use <worker> or <associate>. In order to share data, the XML data has to be transformed into a common format. This is where XSLT shines -- it eliminates the need to write custom computer programs to transform data. Instead, you simply create one or more XSLT stylesheets.
An XSLT processor is an application that applies an XSLT stylesheet to an XML data source. Instead of modifying the original XML data, the result of the transformation is copied into something called a result tree , which can be directed to a static file, sent directly to an output stream, or even piped into another XSLT processor for further transformations. Figure 2-1 illustrates the transformation process, showing how the XML input, XSLT stylesheet, XSLT processor, and result tree relate to one another.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XSLT Introduction
Why is transformation so important? XML provides a simple syntax for defining markup, but it is up to individuals and organizations to define specific markup languages. There is no guarantee that two organizations will use the exact same markup; in fact, you may struggle to agree on consistent formats within the same group or company. One group may use <employee>, while others may use <worker> or <associate>. In order to share data, the XML data has to be transformed into a common format. This is where XSLT shines -- it eliminates the need to write custom computer programs to transform data. Instead, you simply create one or more XSLT stylesheets.
An XSLT processor is an application that applies an XSLT stylesheet to an XML data source. Instead of modifying the original XML data, the result of the transformation is copied into something called a result tree , which can be directed to a static file, sent directly to an output stream, or even piped into another XSLT processor for further transformations. Figure 2-1 illustrates the transformation process, showing how the XML input, XSLT stylesheet, XSLT processor, and result tree relate to one another.
Figure 2-1: XSLT transformation
The XML input and XSLT stylesheet are normally two separate entities. For the examples in this chapter, the XML will always reside in a text file. In future chapters, however, we will see how to improve performance by dealing with the XML as an in-memory object tree. This makes sense from a Java/XSLT perspective because most web applications will generate XML dynamically rather than deal with a series of static files. Since the XML data and XSLT stylesheet are clearly separated, it is very plausible to write several different stylesheets that convert the same XML into radically different formats.
XSLT transformation can occur on either the client or server, although server-side transformations are currently dominant. Since a vast majority of Internet users do not use XSLT-compliant browsers (at the time of this writing), the typical model is to transform XML into HTML on the web server so the browser sees only the resulting HTML. In a closed corporate environment where the browser feature set can be controlled, moving the XSLT transformation process to the browser can improve scalability and reduce network traffic.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Transformation Process
Now that we have seen an example, let's back up and talk about some basics. In particular, it is important to understand the relationship between <xsl:template match=...> and <xsl:apply-templates select=...>. This should help to solidify your understanding of the previous example and lay the groundwork for more sophisticated processing. Although XSLT is a language, it is not intended to be a general-purpose programming language. Because of its specialized mission as a transformation language, the design of XSLT works in the way that XML is structured, which is fundamentally a tree data structure.
Every well-formed XML document forms a tree data structure. The document itself is always the root of the tree, and every element within the document has exactly one parent. Since the document itself is the root, it has no parent. As you learn XSLT, it can be helpful to draw pictures of your XML data that show its tree structure. Figure 2-2 illustrates the tree structure for discussionForumHome.xml.
Figure 2-2: Tree structure for discussionForumHome.xml
The document itself is the root of the tree and may contain processing instructions, the document root element, and even comments. XSLT has the ability to select any of these items, although you will probably want to select elements and attributes when transforming to HTML. As mentioned earlier, the "/" pattern matches the document itself, which is the root node of the entire tree.
A tree data structure is fundamentally recursive because it consists of leaf nodes and smaller trees. Each of these smaller trees, in turn, also consist of leaf nodes and still smaller trees. Algorithms that deal with tree structures can almost always be expressed recursively, and XSLT is no exception. The processing model adopted by XSLT is explicitly designed to take advantage of the recursive nature of every well-formed XML document. This means that most stylesheets can be broken down into highly modular, easily understandable pieces, each of which processes a subset of the overall tree (i.e., a subtree).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Another XSLT Example, Using XHTML
Example 2-5 contains XML data from an imaginary scheduling program. A schedule has an owner followed by a list of appointments. Each appointment has a date, start time, end time, subject, location, and optional notes. Needless to say, a true scheduling application probably has a lot more data, such as repeating appointments, alarms, categories, and many other bells and whistles. Assuming that the scheduler stores its data in XML files, we can easily add features later by writing a stylesheet to convert the existing XML files to some new format.
Example 2-5. schedule.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="schedule.xslt"?>
<schedule>
  <owner>
    <name>
      <first>Eric</first>
      <last>Burke</last>
    </name>
  </owner>
  <appointment>
    <when>
      <date month="03" day="15" year="2001"/>
      <startTime hour="09" minute="30"/>
      <endTime hour="10" minute="30"/>
    </when>
    <subject>Interview potential new hire</subject>
    <location>Rm 103</location>
    <note>Ask Bob for an updated resume.</note>
  </appointment>
  <appointment>
    <when>
      <date month="03" day="15" year="2001"/>
      <startTime hour="15" minute="30"/>
      <endTime hour="16" minute="30"/>
    </when>
    <subject>Dr. Appointment</subject>
    <location>1532 Main Street</location>
  </appointment>
  <appointment>
    <when>
      <date month="03" day="16" year="2001"/>
      <startTime hour="11" minute="30"/>
      <endTime hour="12" minute="30"/>
    </when>
    <subject>Lunch w/Boss</subject>
    <location>Pizza Place on First Capitol Drive</location>
  </appointment>
</schedule>
As you can see, the XML document uses both attributes (month="03") and child elements to represent its data. XSLT has the ability to search for and transform both types of data, as well as comments, processing instructions, and text. In our current document, the appointments are stored in chronological order. Later, we will see how to change the sort order using
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Basics
XPath is another recommendation from the W3C and is designed for use by XSLT and another technology called XPointer. The primary goal of XPath is to define a mechanism for addressing portions of an XML document, which means it is used for locating element nodes, attribute nodes, text nodes, and anything else that can occur in an XML document. XPath treats these nodes as part of a tree structure rather than dealing with XML as a text string. XSLT also relies on the tree structure that XPath defines. In addition to addressing, XPath contains a set of functions to format text, convert to and from numbers, and deal with booleans.
Unlike XSLT, XPath itself is not expressed using XML syntax. A simplified syntax makes sense when you consider that XPath is most commonly used inside of attribute values within other XML documents. XPath includes both a verbose syntax and a set of abbreviations, which end up looking a lot like path names on a file system or web site.
XSLT uses XPath in three basic ways:
  • To select and match patterns in the original XML data. Using XPath in this manner is the focus of this chapter. You see this most often in <xsl:template match="pattern"> and <xsl:apply-templates select="node-set-expression"/>. In either case, XPath syntax is used to locate various types of nodes.
  • To support conditional processing. We will see the exact syntax of <xsl:if> and <xsl:choose> in the next chapter, both of which rely on XPath's ability to represent boolean values of true and false.
  • To generate text. A number of string formatting instructions are provided, giving you the ability to concatenate strings, manipulate substrings, and convert from other data types to strings. Again, this will be covered in the next chapter.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Looping and Sorting
As shown throughout this chapter, you can use <xsl:apply-templates ...> to search for patterns in an XML document. This type of processing is sometimes referred to as a " data driven" approach because the data of the XML file drives the selection process. Another style of XSLT programming is called "template driven," which means that the template's code tends to drive the selection process.
Sometimes it is convenient to explicitly drive the selection process with an <xsl:for-each> element, which is reminiscent of traditional programming techniques. In this approach, you explicitly loop over a collection of nodes without instantiating a separate template as <xsl:apply-templates> does. The syntax for <xsl:for-each> is as follows:
<xsl:for-each select="president">
  ...content for each president element
</xsl:for-each>
The select attribute can contain any XPath location path, and the loop will iterate over each element in the resulting node set. In this example, the context is <president> for all content within the loop. Nested loops are possible and could be used to loop over the list of <vicePresident> elements.
Sorting can be applied in either a data-driven or template-driven approach. In either case, <xsl:sort> is added as a child element to something else. By adding several consecutive <xsl:sort> elements, you can accomplish multifield sorting. Each sort can be in ascending or descending order, and the data type for sorting is either "number" or "text". The sort order defaults to ascending. Some examples of <xsl:sort> include:
<xsl:sort select="first"/>
<xsl:sort select="last" order="descending"/>
<xsl:sort select="term/@from" order="descending" data-type="number"/>
<xsl:sort select="name/first" data-type="text" case-order="upper-first"/>
In the last line, the case-order attribute specifies that uppercase letters should be alphabetized before lowercase letters. The other accepted value for this attribute is
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outputting Dynamic Attributes
Let's assume we have an XML document that lists books in a personal library, and we want to create an HTML document with links to these books on Amazon.com. In order to generate the hyperlink, the href attribute must contain the ISBN of the book, which can be found in our original XML data. An example of the URL we would like to generate is as follows:
<a href="http://www.amazon.com/exec/obidos/ASIN/0596000162">Java and XML</a>
One thought is to include <xsl:value-of select="isbn"/> directly inside of the attribute. However, XML does not allow you to insert the less-than (<) character inside of an attribute value:
<!-- won't work... -->
<a href="<xsl:value-of select="isbn"/>">Java and XML</a>
We also need to consider that the attribute value is dynamic rather than static. XSLT does not automatically recognize content of the href="..." attribute as an XPath expression, since the <a> tag is not part of XSLT. There are two possible solutions to this problem.
In the first approach, <xsl:attribute> is used to add one or more attributes to elements. In the following template, an href attribute is added to an <a> element:
<xsl:template match="book">
  <li>
    <a>  <!-- the href attribute is generated below -->
      <xsl:attribute name="href">
        <xsl:text>http://www.amazon.com/exec/obidos/ASIN/</xsl:text>
        <xsl:value-of select="@isbn"/>
      </xsl:attribute>
      <xsl:value-of select="title"/>
    </a>
  </li>
</xsl:template>
The <li> tag is used because this is part of a larger stylesheet that presents a bulleted list of links to each book. The <a> tag, as you can see, is missing its href attribute. The <xsl:attribute> element adds the missing href. Any child content of <xsl:attribute> is added to the attribute value. Because we do not want to introduce any unnecessary whitespace, <xsl:text> is used. Finally, <xsl:value-of> is used to select the isbn attribute.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: XSLT Part 2 -- Beyond the Basics
As you may have guessed, this chapter is a continuation of the material presented in the previous chapter. The basic syntax of XSLT should make sense by now. If not, it is probably a good idea to sit down and write a few stylesheets to gain some basic familiarity with the technology. What we have seen so far covers the basic mechanics of XSLT but does not take full advantage of the programming capabilities this language has to offer. In particular, this chapter will show how to write more reusable, modular code through features such as named templates, parameters, and variables.
The chapter concludes with a real-world example that uses XSLT to produce HTML documentation for Ant build files. Ant is a Java build tool that uses XML files instead of Makefiles to drive the compilation process. Since XML is used, XSLT is a natural choice for producing documentation about the build process.
In the previous chapter, we saw a template that output the name of a president or vice president. Its basic job was to display the first name, middle name, and last name. A nonbreaking space was printed between each piece of data so the fields did not run into each other. What we did not see was that many presidents do not have middle names, so our template ended up printing the first name, followed by two spaces, followed by the last name. To fix this, we need to check for the existence of a middle name before simply outputting its content and a space. This requires conditional logic , a feature found in just about every programming language in existence.
XSLT provides two mechanisms that support conditional logic: <xsl:if> and <xsl:choose>. These allow a stylesheet to produce different output depending on the results of a boolean expression , which must yield true or false as defined by the XPath specification.
The behavior of the <xsl:if> element is comparable to the following Java code:
if (boolean-expression) {
  // do something
}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conditional Processing
In the previous chapter, we saw a template that output the name of a president or vice president. Its basic job was to display the first name, middle name, and last name. A nonbreaking space was printed between each piece of data so the fields did not run into each other. What we did not see was that many presidents do not have middle names, so our template ended up printing the first name, followed by two spaces, followed by the last name. To fix this, we need to check for the existence of a middle name before simply outputting its content and a space. This requires conditional logic , a feature found in just about every programming language in existence.
XSLT provides two mechanisms that support conditional logic: <xsl:if> and <xsl:choose>. These allow a stylesheet to produce different output depending on the results of a boolean expression , which must yield true or false as defined by the XPath specification.
The behavior of the <xsl:if> element is comparable to the following Java code:
if (boolean-expression) {
  // do something
}
In XSLT, the syntax is as follows:
<xsl:if test="boolean-expression">
  <!-- Content: template -->
</xsl:if>
The test attribute is required and must contain a boolean expression. If the result is true, the content of this element is instantiated; otherwise, it is skipped. The code in Example 3-1 illustrates several uses of <xsl:if> and related XPath expressions. Code that is highlighted will be discussed in the next several paragraphs.
Example 3-1. <xsl:if> examples
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>
  <!--******************************************************
      ** "/" template
      ***************************************************-->
  <xsl:template match="/">
    <html>
      <body>
        <h1>Conditional Processing Examples</h1>
        <xsl:apply-templates select="presidents"/>
      </body>
    </html>
  </xsl:template>
  <!--******************************************************
      ** "presidents" template
      ***************************************************-->
  <xsl:template match="presidents">
    <h3>
      List of
        
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parameters and Variables
As in other programming languages, it is often desirable to set up a variable whose value is reused in several places throughout a stylesheet. If the title of a book is displayed repeatedly, then it makes sense to store that title in a variable rather than scan through the XML data and locate the title repeatedly. It can also be beneficial to set up a variable once and pass it as a parameter to one or more templates. These templates often use <xsl:if> or <xsl:choose> to produce different content depending on the value of the parameter that was passed.
Variables in XSLT are defined with the <xsl:variable> element and can be global or local. A global variable is defined at the "top-level" of a stylesheet, which means that it is defined outside of any templates as a direct child of the <xsl:stylesheet> element. Top-level variables are visible throughout the entire stylesheet, even in templates that occur before the variable declaration.
The other place to define a variable is inside of a template. These variables are visible only to elements that follow the <xsl:variable> declaration within that template and to their descendants. The code in Example 3-2 showed this form of <xsl:variable> as a mechanism to define the font color.

Section 3.2.1.1: Defining variables

Variables can be defined in one of three ways:
<xsl:variable name="homePage">index.html</xsl:variable>
<xsl:variable name="lastPresident"select="president[position() = last(  )]/name"/>
<xsl:variable name="empty"/>
In the first example, the content of <xsl:variable> specifies the variable value. In the simple example listed here, the text index.html is assigned to the homePage variable. More complex content is certainly possible, as shown earlier in Example 3-2.
The second way to define a variable relies on the select attribute. The value is an XPath expression, so in this case we are selecting the name of the last president in the list.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Combining Multiple Stylesheets
Through template parameters, named templates, and template modes, we have seen how to create more reusable fragments of code that begin to resemble function calls. By combining multiple stylesheets, one can begin to develop libraries of reusable XSLT templates that can dramatically increase productivity.
Productivity gains occur because programmers are not writing the same code over and over for each stylesheet. Reusable code is placed into a single stylesheet and imported or included into other stylesheets. Another advantage of this technique is maintainability. XSLT syntax can get ugly, and modularizing code into small fragments can greatly enhance readability. For example, we have seen several examples related to the list of presidents so far. Since we almost always want to display the name of a president or vice president, name-formatting templates should be broken out into a separate stylesheet. Example 3-7 shows a stylesheet designed for reuse by other stylesheets.
Example 3-7. nameFormatting.xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>
  <!--
    ** Show a name formatted like: "Burke, Eric Matthew"
    -->
  <xsl:template match="name" mode="lastFirstMiddle">
    <xsl:value-of select="last"/>
    <xsl:text>, </xsl:text>
    <xsl:value-of select="first"/>
    <xsl:for-each select="middle">
      <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text>
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:template>

  <!--
    ** Show a name formatted like: "Eric Matthew Burke"
    -->
  <xsl:template match="name" mode="firstMiddleLast">
    <xsl:value-of select="first"/>
    <xsl:for-each select="middle">
      <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text>
      <xsl:value-of select="."/>
    </xsl:for-each>
    <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text>
    <xsl:value-of select="last"/>
  </xsl:template>
</xsl:stylesheet>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Formatting Text and Numbers
XSLT and XPath define a small set of functions to manipulate text and numbers. These allow you to concatenate strings, extract substrings, determine the length of a string, and perform other similar tasks. While these features do not approach the capabilities offered by a programming language like Java, they do allow for some of the most common string manipulation tasks.
The format-number( ) function is provided by XSLT to convert numbers such as 123 into formatted numbers such as $123.00. The function takes the following form:
string format-number(number, string, string?)
The first parameter is the number to format, the second is a format string, and the third (optional) is the name of an <xsl:decimal-format> element. We will cover only the first two parameters in this book. Interestingly enough, the behavior of the format-number( ) function is defined by the JDK 1.1.x version of the java.text.DecimalFormat class. For complete information on the syntax of the second argument, refer to the JavaDocs for JDK 1.1.x.
Outputting currencies is a common use for the format-number( ) function. The pattern $#,##0.00 can properly format a number into just about any U.S. currency. Table 3-2 demonstrates several possible inputs and results for this pattern.
Table 3-2: Formatting currencies using $#,##0.00
Number
Result
0
$0.00
0.9
$0.90
0.919
$0.92
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Schema Evolution
Looking beyond HTML generation, a key use for XSLT is transforming one form of XML into another form. In many cases, these are not radical transformations, but minor enhancements such as adding new attributes, changing the order of elements, or removing unused data. If you have only a handful of XML files to transform, it is a lot easier to simply edit the XML directly rather than going through the trouble of writing a stylesheet. But in cases where a large collection of XML documents exist, a single XSLT stylesheet can perform transformations on an entire library of XML files in a single pass. For B2B applications, schema evolution is useful when different customers require the same data, but in different formats.
Let's suppose that you wrote a logging API for your Java programs. Log files are written in XML and are formatted as shown in Example 3-10.
Example 3-10. Log file before transformation
<?xml version="1.0" encoding="UTF-8"?>
<log>
  <message text="input parameter was null">
    <type>ERROR</type>
    <when>
      <year>2000</year>
      <month>01</month>
      <day>15</day>
      <hour>03</hour>
      <minute>12</minute>
      <second>18</second>
    </when>
    <where>
      <class>com.foobar.util.StringUtil</class>
      <method>reverse(String)</method>
    </where>
  </message>
  <message text="cannot read config file">
    <type>WARNING</type>
    <when>
      <year>2000</year>
      <month>01</month>
      <day>15</day>
      <hour>06</hour>
      <minute>35</minute>
      <second>44</second>
    </when>
    <where>
      <class>com.foobar.servlet.MainServlet</class>
      <method>init(  )</method>
    </where>
  </message>
  <!-- more messages ... -->
</log>
As you can see from this example, the file format is quite verbose. Of particular concern is how the date and time are written. Since log files can be quite large, it would be a good idea to select a more concise format for this information. Additionally, the text is stored as an attribute on the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Ant Documentation Stylesheet
Apache's Ant has taken the Java development community by storm, supplementing traditional Java IDEs and outright replacing Makefiles on most Java development projects. Ant is a build tool, similar to the make utility, only it uses XML files instead of Makefiles. In addition to a portable build file based on XML, Ant itself is written in Java and has few platform-specific dependencies. Finally, since Ant can reuse the same running instance of the Java Virtual Machine for nearly every step of the build process, it is blazingly fast. Ant can be downloaded from http://jakarta.apache.org and is open source software.
Ant is driven by an XML build file , which consists of one project . This project contains one or more targets , and targets can have dependencies on one another. The project and targets are represented as <project> and <target> in the XML build file; <project> must be the document root element. It is common to have a "prepare" target that builds the output directories and a "compile" target that depends on the "prepare" target. If you tell Ant to execute the "compile" target, it first checks to see that the "prepare" target has created the necessary directories. The structure of an Ant build file looks like this:
<?xml version="1.0"?>
<project name="SampleProject" default="compile" basedir=".">

  <!-- global properties -->
  <property name="srcdir" value="src"/>
  <property name="builddir" value="build"/>

  <target name="prepare" description="Creates the output directories">
    ...tasks
  </target>

  <target name="compile" depends="prepare">
    ...tasks
  </target>

  <target name="distribute" depends="compile">
    ...tasks
  </target>
</project>
For each target, Ant is smart enough to know if files have been modified and if it needs to do any work. For compilation, the timestamps of .class files are compared to timestamps of .java files. Through these dependencies, Ant can avoid unnecessary compilation and perform quite well. Although the targets shown here contain only single dependencies, it is possible for a target to depend on several other targets:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Java-Based Web Technologies
In a perfect world, a single web development technology would be inexpensive, easy to maintain, offer rapid response time, and be highly scalable. It would also be portable to any operating system or hardware platform and would adapt well to future requirement changes. It would support access from wireless devices, standalone client applications, and web browsers, all with minimal changes to code.
No perfect solution exists, nor is one likely to exist anytime soon. If it did, many of us would be out of work. A big part of software engineering is recognizing that tradeoffs are inevitable and knowing when to sacrifice one set of goals in order to deliver the maximum value to your customer or business. For example, far too many programmers focus on raw performance metrics without any consideration for ease of development or maintainability by nonexperts. These decisions are hard and are often subjective, based on individual experience and preferences.
The goal of this chapter is to look at the highlights of several popular technologies for web application development using Java and see how each measures up to an XSLT-based approach. The focus is on architecture, which implies a high-level viewpoint without emphasis on specific implementation details. Although XSLT offers a good balance between performance, maintainability, and flexibility, it is not the right solution for all applications. It is hoped that the comparisons made here will help you decide if XSLT is the right choice for your web applications.
Before delving into more sophisticated options, let's step back and look at a few basic approaches to web development using Java. For small web applications or moderately dynamic web sites, these approaches may be sufficient. As you might suspect, however, none of these approaches hold up as well as XML and XSLT when your sites get more complex.
Common Gateway Interface (CGI) is a protocol for interfacing external applications, which can be written in just about any language, with web servers. The most common language choices for CGI are C and Perl. This interface is accomplished in a number of ways, depending on the type of request. For example, parameters associated with an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Traditional Approaches
Before delving into more sophisticated options, let's step back and look at a few basic approaches to web development using Java. For small web applications or moderately dynamic web sites, these approaches may be sufficient. As you might suspect, however, none of these approaches hold up as well as XML and XSLT when your sites get more complex.
Common Gateway Interface (CGI) is a protocol for interfacing external applications, which can be written in just about any language, with web servers. The most common language choices for CGI are C and Perl. This interface is accomplished in a number of ways, depending on the type of request. For example, parameters associated with an HTTP GET request are passed to the CGI script via the QUERY_STRING environment variable. HTTP POST data, on the other hand, is piped to the standard input stream of the CGI script. CGI always sends results back to the web server via its standard output.
Ordinary CGI programs are invoked from the web server as external programs, which is the most notable difference when compared with servlets. With each request from the browser, the web server spawns a new process to run the CGI program. Aside from the obvious performance penalty, this also makes it difficult to maintain state information between requests. A web-based shopping cart is a perfect example of state information that must be preserved between requests. Figure 4-1 illustrates the CGI process.
Figure 4-1: CGI process
FastCGI is an alternative to CGI with two notable differences. First, FastCGI processes do not exit with each request/response cycle. Second, the environment variable and pipe I/O mechanism of CGI has been eschewed in favor of TCP connections, allowing FastCGI programs to be distributed to different servers. The net result is that FastCGI eliminates the most vexing problems of CGI while making it easy to salvage existing CGI programs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Universal Design
Despite the proliferation of APIs, frameworks, and template engines, most web application approaches seem to be consolidating around the idea of model-view-controller (MVC). Clean separation between data, presentation, and programming logic is a key goal of this design. Most web frameworks implement this pattern, and the hybrid approach of JSP and servlets follows it. XSLT implementations also use this pattern, which leads to the conclusion that model-view-controller is truly a universal approach to development on the web tier.
A framework is a value-added class library that makes it easier to develop certain types of applications. For example, an imaging framework may contain APIs for reading, writing, and displaying several image formats. This makes it much easier to build applications because someone else already figured out how to structure your application.
Servlet frameworks are no different. Now that servlets, JSP, and hybrid approaches have been available for a few years, common architectural patterns are emerging as "best practices." These include separation of Java code and HTML generation, using servlets in conjunction with JSP, and other variations. Once basic patterns and themes are understood, it becomes desirable to write common frameworks that automate the mundane tasks of building web applications.
The most important tradeoff you make when selecting a framework is vendor lock-in versus open standards. At this time, there are no open standards for frameworks. Although there are numerous open source frameworks, none is backed by a standards organization or even Sun's Java Community Process. The low-level servlet and JSP APIs are very well defined and widely implemented Java standard extensions. But a framework can offer much more sophisticated features such as enhanced error checking, database connection pooling, custom tag libraries, and other value-added features. As you add more framework-specific features, however, your flexibility to choose another framework or vendor quickly diminishes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XSLT and EJB
Now that the options for web tier development have been examined, let's look at how the web tier interacts with other tiers in large enterprise class systems. A typical EJB architecture involves a thin browser client, a servlet-driven web tier, and EJB on an application server tier. Figure 4-7 expands upon the conceptual XSLT model presented earlier.
Figure 4-7: XSLT and EJB architecture
This diagram is much closer to the true physical model of a multitier web application that uses XSLT. The arrows indicate the overall flow of a single request, originating with the client. This client is typically a web browser, but it could be a cell phone or some other device. The client request goes to a single servlet and is handed off to something called RequestHandler . In the pattern outlined here, you create numerous subclasses of RequestHandler. Each subclass is responsible for validation and presentation logic for a small set of related functions. One manageable strategy is to design one subclass of RequestHandler for each web page in the application. Another approach is to create fine-grained request handlers that handle one specific task, which can be beneficial if the same piece of functionality is invoked from many different screens in your application.
The request handler interacts with the application server via EJB components. The normal pattern is to execute commands on session beans, which in turn get their data from entity beans. The internal behavior of the EJB layer is irrelevant to the web tier, however. Once the EJB method call is complete, one or more "data objects" are returned to the web tier. From this point, the data object must be converted to XML.
The conversion to XML can be handled in a few different ways. One common approach is to write methods in the data objects themselves that know how to generate a fragment of XML, or perhaps an entire document. Another approach is to write an XML adapter class for each data object. Instead of embedding the XML generation code into the data object, the adapter class generates the XML. This approach has the advantage of keeping the data objects lightweight and clean, but it does result in additional classes to write. In either approach, it is preferable to return XML as a DOM or JDOM tree, rather than raw XML text. If the XML is return