Earlier in this chapter, we used a Transformer
object to copy a DOM representation
of an example back to XML text. We mentioned that we were not really
tapping the potential of the Transformer
. Now, we’ll
give you the full story.
The javax.xml.transform
package is the API for using the XSL/XSLT transformation language. XSL
stands for Extensible Stylesheet Language. Like Cascading Stylesheets (CSS) for HTML, XSL allows us to
“mark up” XML documents by adding tags that provide presentation
information. XSL Transformation (XSLT) takes this further by adding the
ability to completely restructure the XML and produce arbitrary output.
XSL and XSLT together make up their own programming language for
processing an XML document as input and producing another (usually XML)
document as output. (From here on in, we’ll refer to them collectively as
XSL.)
XSL is extremely powerful, and new applications for its use arise every day. For example, consider a website that is frequently updated and that must provide access to a variety of mobile devices and traditional browsers. Rather than recreating the site for these and additional platforms, XSL can transform the content to an appropriate format for each platform. More generally, rendering content from XML is simply a better way to preserve your data and keep it separate from your presentation information. XSL can be used to render an entire website in different styles from files containing “pure data” in XML, much like a database. Multilingual sites also benefit from XSL to lay out text in different ways for different audiences.
You can probably guess the caveat that we’re going to issue: XSL is a big topic worthy of its own books (see, for example, O’Reilly’s Java and XSLT by Eric Burke), and we can only give you a taste of it here. Furthermore, some people find XSL difficult to understand at first glance because it requires thinking in terms of recursively processing document tags. In recent years, much of the impetus behind XSL as a way to produce web-based content has fallen away in favor of using more JavaScript on the client. However, XSL remains a powerful way to transform XML and is widely used in other document-oriented applications.
XSL is an XML-based standard, so it should come as no surprise that the language is based on XML. An XSL stylesheet is an XML document using special tags defined by the XSL namespace to describe the transformation. The most basic XSL operations involve matching parts of the input XML document and generating output based on their contents. One or more XSL templates live within the stylesheet and are called in response to tags appearing in the input. XSL is often used in a purely input-driven way, whereas input XML tags trigger output in the order in which they appear, using only the information they contain. But more generally, the output can be constructed from arbitrary parts of the input, drawing from it like a database, composing elements and attributes. The XSLT transformation part of XSL adds things like conditionals and iteration to this mix, which enable any kind of output to be generated based on the input.
An XSL stylesheet contains a stylesheet tag as its root element.
By convention, the stylesheet defines a namespace prefix xsl
for the XSL namespace. Within the
stylesheet, are one or more template tags contain a match
attribute that
describes the element upon which they operate.
<
xsl:
stylesheet
xmlns:
xsl
=
"http://www.w3.org/1999/XSL/Transform"
version
=
"1.0"
>
<
xsl:
template
match
=
"/"
>
I
found
the
root
of
the
document
!
</
xsl:
template
>
</
xsl:
stylesheet
>
When a template matches an element, it has an opportunity to handle all the children of the element. The simple stylesheet shown here has one template that matches the root of the input document and simply outputs some plain text. By default, input not matched is simply copied to the output with its tags stripped (HTML convention). But here we match the root so we consume the entire input and nothing but our message appears on the output.
The match
attribute can
refer to elements using the XPath notation that we described earlier.
This is a hierarchical path starting with the root element. For example,
match="/inventory/animal"
would match
only the animal
elements from our
zooinventory.xml file. In XSL, the path may be
absolute (starting with “/”) or relative, in which case, the template
detects whenever that element appears in any subcontext (equivalent to
“//” in XPath).
Within the template, we can put whatever we want as long as it is
well-formed XML (if not, we can use a CDATA section or XInclude). But
the real power comes when we use parts of the input to generate output.
The XSL value-of
tag is used to
output the content or child of the element. For example, the following
template would match an animal
element and output the value of its Name
child element:
<
xsl:
template
match
=
"animal"
>
Name:
<
xsl:
value
-
of
select
=
"name"
/>
</
xsl:
template
>
The select
attribute uses
an XPath expression relative to the current node. In this case, we tell
it to print the value of the name
element within animal
. We could have
used a relative path to a more deeply nested element within animal
or even an absolute path to another
part of the document. To refer to the “current” element (in this case,
the animal
element itself), a
select
expression can use “.
” as the path. The select
expression can also retrieve attributes
from the elements that it references.
If we try to add the animal
template to our simple example, it won’t generate any output. What’s the
problem? If you recall, we said that a template matching an element has
the opportunity to process all its children. We already have a template
matching the root (“/”), so it is consuming all the input. The answer to
our dilemma—and this is where things get a little tricky—is to delegate
the matching to other templates using the apply-templates
tag.
The following example correctly prints the names of all the animals in
our document:
<
xsl:
stylesheet
xmlns:
xsl
=
"http://www.w3.org/1999/XSL/
Transform"
version
=
"1.0"
>
<
xsl:
template
match
=
"/"
>
Found
the
root
!
<
xsl:
apply
-
templates
/>
</
xsl:
template
>
<
xsl:
template
match
=
"animal"
>
Name:
<
xsl:
value
-
of
select
=
"name"
/>
</
xsl:
template
>
</
xsl:
stylesheet
>
We still have the opportunity to add output before and after the
apply-templates
tag. But upon
invoking it, the template matching continues from the current node.
Next, we’ll use what we have so far and add a few bells and
whistles.
Your boss just called, and it’s now imperative that your zoo clients have access to the zoo inventory through the Web, today! After reading Chapter 15, you should be thoroughly prepared to build a nice “zoo app.” Let’s get started by creating an XSL stylesheet to turn our zooinventory.xml into HTML:
<?
xml
version
=
"1.0"
encoding
=
"UTF-8"
?>
<
xs:
schema
xmlns:
xs
=
"http://www.w3.org/2001/XMLSchema"
>
<
xs:
element
name
=
"inventory"
>
<
xs:
complexType
>
<
xs:
sequence
>
<
xs:
element
maxOccurs
=
"unbounded"
ref
=
"animal"
/>
</
xs:
sequence
>
</
xs:
complexType
>
</
xs:
element
>
<
xs:
element
name
=
"name"
type
=
"xs:string"
/>
<
xs:
element
name
=
"animal"
>
<
xs:
complexType
>
<
xs:
sequence
>
<
xs:
element
ref
=
"name"
/>
<
xs:
element
name
=
"species"
type
=
"xs:string"
/>
<
xs:
element
name
=
"habitat"
type
=
"xs:string"
/>
<
xs:
choice
>
<
xs:
element
name
=
"food"
type
=
"xs:string"
/>
<
xs:
element
ref
=
"foodRecipe"
/>
</
xs:
choice
>
<
xs:
element
name
=
"temperament"
type
=
"xs:string"
/>
<
xs:
element
name
=
"weight"
type
=
"xs:double"
/>
</
xs:
sequence
>
<
xs:
attribute
name
=
"animalClass"
default
=
"unknown"
>
<
xs:
simpleType
>
<
xs:
restriction
base
=
"xs:token"
>
<
xs:
enumeration
value
=
"unknown"
/>
<
xs:
enumeration
value
=
"mammal"
/>
<
xs:
enumeration
value
=
"reptile"
/>
<
xs:
enumeration
value
=
"bird"
/>
</
xs:
restriction
>
</
xs:
simpleType
>
</
xs:
attribute
>
</
xs:
complexType
>
</
xs:
element
>
<
xs:
element
name
=
"foodRecipe"
>
<
xs:
complexType
>
<
xs:
sequence
>
<
xs:
element
ref
=
"name"
/>
<
xs:
element
maxOccurs
=
"unbounded"
name
=
"ingredient"
type
=
"xs:string"
/>
</
xs:
sequence
>
</
xs:
complexType
>
</
xs:
element
>
</
xs:
schema
>
The stylesheet contains three templates. The first matches
/inventory
and outputs the beginning
of our HTML document (the header) along with the start of a table for
the animals. It then delegates using apply-templates
before
closing the table and adding the HTML footer. The next template matches
inventory/animal
, printing one row of
an HTML table for each animal. Although there are no other animal
elements in the document, it still
doesn’t hurt to specify that we will match an animal
only in the context of an inventory
, because, in this case, we are
relying on inventory
to start and end
our table. (This template makes sense only in the context of an inventory
.) Finally, we provide a template
that matches foodRecipe
and prints a
small, nested table for that information. foodRecipe
makes use of the "for-each
" operation to loop over child nodes
with a select
specifying that
we are only interested in ingredient
children. For each ingredient
, we
output its value in a row.
There is one more thing to note in the animal
template. Our apply-templates
element has a select
attribute that limits the elements
affected. In this case, we are using the "|
" regular expression-like syntax to say that
we want to apply templates for only the food
orfoodRecipe
child elements. Why do we do this?
Because we didn’t match the root of the document (only inventory
), we still have the default
stylesheet behavior of outputting the plain text of nodes that aren’t
matched anywhere else. We take advantage of this behavior to print the
text of the food
element. But we
don’t want to output the text of all of the other elements of animal
that we’ve already printed explicitly,
so we process only the food
and
foodRecipe
elements. Alternatively,
we could have been more verbose, adding a template matching the root and
another template just for the food
element. That would also mean that new tags added to our XML would, by
default, be ignored and not change the output. This may or may not be
the behavior you want, and there are other options as well. As with all
powerful tools, there is usually more than one way to do
something.
Now that we have a stylesheet, let’s apply it! The
following simple program, XSLTransform
, uses the javax.xml.transform
package to apply the
stylesheet to an XML document and print the result. You can use it to
experiment with XSL and our example code.
import
javax.xml.transform.*
;
import
javax.xml.transform.stream.*
;
public
class
XSLTransform
{
public
static
void
main
(
String
[]
args
)
throws
Exception
{
if
(
args
.
length
<
2
||
!
args
[
0
].
endsWith
(
".xsl"
)
)
{
System
.
err
.
println
(
"usage: XSLTransform file.xsl file.xml"
);
System
.
exit
(
1
);
}
String
xslFile
=
args
[
0
],
xmlFile
=
args
[
1
];
TransformerFactory
factory
=
TransformerFactory
.
newInstance
();
Transformer
transformer
=
factory
.
newTransformer
(
new
StreamSource
(
xslFile
)
);
StreamSource
xmlsource
=
new
StreamSource
(
xmlFile
);
StreamResult
output
=
new
StreamResult
(
System
.
out
);
transformer
.
transform
(
xmlsource
,
output
);
}
}
Run XSLTransform
, passing the
XSL stylesheet and XML input, as in the following command:
%
java
XSLTransform
zooinventory
.
xsl
zooinventory
.
xml
>
zooinventory
.
html
The output should look like Figure 24-2.
Constructing the transform is a similar process to that of getting
a SAX or DOM parser. The difference from our earlier use of the TransformerFactory
is that this time, we construct the transformer, passing it
the XSL stylesheet source. The resulting Transformer
object is then a dedicated machine that knows how to take input
XML and generate output according to its rules.
One important thing to note about XSLTransform
is that it is not guaranteed
thread-safe. In our example, we run the transform only once. If you are
planning to run the same transform many times, you should take the
additional step of getting a Templates
object for the transform first, then
using it to create Transformer
s.
Templates
templates
=
factory
.
newTemplates
(
new
StreamSource
(
args
[
0
]
)
);
Transformer
transformer
=
templates
.
newTransformer
();
The Templates
object holds the
parsed representation of the stylesheet in a compiled form and makes the
process of getting a new Transformer
much faster. The transformers themselves may also be more highly
optimized in this case. The XSL transformer actually generates bytecode
for very efficient “translets” that implement the transform. This means
that instead of the transformer reading a description of what to do with
your XML, it actually produces a small compiled program to execute the
instructions!
With our XSLTransform
example, you can see how you’d go about rendering XML to an HTML
document on the server side. But as mentioned in the introduction,
modern web browsers support XSL on the client side as well. Browsers can
automatically download an XSL stylesheet and use it to transform an XML
document. To make this happen, just add a standard XSL stylesheet
reference in your XML. You can put the stylesheet directive next to your
DOCTYPE declaration in the zooinventory.xml
file:
<?
xml
-
stylesheet
type
=
"text/xsl"
href
=
"zooinventory.xsl"
?>
As long as the zooinventory.xsl file is available at the same location (base URL) as the zooinventory.xml file, the browser will use it to render HTML on the client side.
Get Learning Java, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.