BUY THIS BOOK
Add to Cart

Print Book $29.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £20.95

What is this?

Looking to Reprint this content?


XML Publishing with AxKit
XML Publishing with AxKit By Kip Hampton
June 2004
Pages: 216

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: XML as a Publishing Technology
In the early days of the commercial Web, otherwise reasonable and intelligent people bought into the notion that simply having a publicly available web site was enough. Enough to get their company noticed. Enough to become a major player in the global market. Enough to capture that magical and vaguely defined commodity called market share. Somehow that would be enough to ensure that consumers and investors would pour out bags of money on the steps of company headquarters. In those heady days, budgets for web-related technologies appeared limitless, and the development practices of the time reflected that—it seemed perfectly reasonable to follow the celebration of a site's rollout with initial discussions about what the next version of that site would look like and do. (Sometimes, the next redesign was already in the works before the current redesign was even launched.) It did not matter, technically, that a site was largely hardcoded and inflexible, or that the scripts that implemented the dynamic applications were messy and impossible to maintain over time. What mattered was that the project was done quickly. If a few bad choices were made along the way, it was thought, they could always be addressed during the inevitable redesign.
Those days are gone.
The goldrush mentality has receded and companies and other organizations are looking for more from their investment in the Web. Simply having a site out there is not enough (and truly, it never was). The site must do something that measurably adds value to the organization and that value must exceed the cost of developing the site in the first place. In other words, the New Economy had a rather abrupt introduction to the rules of Business As Usual. This industry-wide belt-tightening means that web developers must adjust their approach to production. Companies can no longer afford to write off the time and energy invested in developing a web site simply to replace it with something largely similar. Developers are expected to provide dynamic, malleable solutions that can evolve over time to include new content, dynamic features, and support for new types of client software. In short, today's developers are being asked to do more with less. They need tools that can cope with major changes to a site or an application without altering the foundation that is already there.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exploding a Few Myths About XML Publishing
XML and its associated technologies have generated enormous interest. XML pundits describe in florid terms how moving to XML is the first step toward a Utopian new Web, while well-funded marketing departments churn out page after page of ambiguous doublespeak about how using XML is the cure for everything from low visitor traffic to male-pattern baldness. While you may admire visionary zeal on the one hand and understand the simple desire to generate new business on the other, the unfortunate result is that many web developers are confused about what XML is and what it is good for. Here, I clear up a few of the more common fallacies about XML and its use as a web-publishing technology.
Using XML means having to memorize a pile of complex specifications.
There is certainly no shortage of specifications, recommendations, or white papers that describe or relate to XML technologies. Developing even a cursory familiarity with them all would be a full-time job. The fact is, though, that many of these specifications only describe a single application of XML. Unless that tool solves a specific existing need, there's no reason for a developer to try to use it, especially if you come to XML from an HTML background. A general introduction to XML's basic rules, and perhaps a quick tutorial or two that covers XSLT or another transformative tool, are all you need to be productive with XML and a tool such as AxKit. Be sane. Take a pragmatic approach: learn only what you need to deliver on the requirements at hand.
Moving to XML means throwing away all the tools and techniques that I have learned thus far.
XML is simply a way to capture data, nothing more. No tool is appropriate for all cases, and knowing how to use XML effectively simply adds another tool to your bag of tricks. Additionally (as you will see in Chapter 9), many tools you may be using today can be integrated seamlessly into AxKit's framework. You can keep doing what worked well in the past while taking advantage of what AxKit may offer in the way of additional features.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XML Basics
Markup technology has a long and rich history. In the 1960s, while developing an integrated document storage, editing, and publishing system at IBM, Charles Goldfarb, Edward Mosher, and Raymond Lorie devised a text-based markup format. It extended the concepts of generic coding (block-level tagging that was both machine-parsable and meaningful to human authors) to include formal, nested elements that defined the type and structure of the document being processed. This format was called the Generalized Markup Language (GML). GML was a success, and as it was more widely deployed, the American National Standards Institute (ANSI) invited Goldfarb to join its Computer Languages for Text Processing committee to help develop a text description standard-based GML. The result was the Standard Generalized Markup Language (SGML). In addition to the flexibility and semantic richness offered by GML, SGML incorporated concepts from other areas of information theory; perhaps most notably, inter-document link processing and a practical means to programmatically validate markup documents by ensuring that the content conformed to a specific grammar. These features (and many more) made SGML a natural and capable fit for larger organizations that needed to ensure consistency across vast repositories of documents. By the time the final ISO SGML standard was published in 1986, it was in heavy use by bodies as diverse as the Association of American Publishers, the U.S. Department of Defense, and the European Laboratory for Particle Physics (CERN).
In 1990, while developing a linked information system for CERN, Tim Berners-Lee hit on the notion of creating a small, easy-to-learn subset of SGML. It would allow people who were not markup experts to easily publish interconnected research documents over a network—specifically, the Internet. The Hypertext Markup Language (HTML) and its sibling network technology, the Hypertext Transfer Protocol (HTTP) were born. Four years later, after widespread and enthusiastic adoption of HTML by academic research circles throughout the globe, Berners-Lee and others formed the World Wide Web Consortium (W3C) in an effort to create an open but centralized organization to lead the development of the Web.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Publishing XML Content
In the most general sense, delivering XML documents over the Web is much the same as serving any other type of document—a client application makes a request over a network to a server for a given resource, the server then interprets that request (URI, headers, content), returns the appropriate response (headers, content), and closes the connection. However, unlike serving HTML documents or MP3 files, the intended use for an XML document is not apparent from the format (or content type) itself. Further processing is usually required. For example, even though most modern web browsers offer a way to view XML documents, there is no way for the browser to know how to render your custom grammar visually. Simply presenting the literal markup or an expandable tree view of the document's contents usually communicates nothing meaningful to the user. In short, the document must be transformed from the markup grammar that best fits your needs into the format that best fits the expectations of the requesting client.
This separation between the source content and the form in which it will be presented (and the need to transform one into the other) is the heart and soul of XML publishing. Not only does making a clear distinction between content and presentation allow you to use the grammar that best captures your content, it provides a clear and logical path toward reusing that content in novel ways without altering the data's source. Suppose you want to publish the poems from the collection mentioned in the previous section as HTML. You simply transform the documents from the poemsfrag grammar into the grammar that an HTML browser expects. Later, if you decide that PDF or PostScript is the best way to deliver the content, you only need to change the way the source is transformed, not the source itself. Similarly, if your XML expresses more record-oriented data—generated from the result of an SQL query, for example—the separation between content and presentation offers a way to provide the data through a variety of interfaces just by changing the way the markup is transformed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introducing AxKit, an XML Application Server for Apache
Originally conceived in 2000 by Matt Sergeant as a Perl-powered alternative to the then Java-centric world of XML application servers, AxKit (short for Apache XML Toolkit) uses the mod_perl extension to the Apache HTTP server to turn Apache into an XML publishing and application server. AxKit extends Apache by offering a rich set of server configuration directives designed to simplify and automate common tasks associated with publishing XML content, selecting and applying transformative processes to XML content to deliver the most appropriate result.
Using AxKit's custom directives, content transformations (including chains of transformations) can be applied based on a variety of conditions (request URI, aspects of the XML content, and much more) on a resource-by-resource basis. Among other things, this provides the ability to set up multiple, alternate styles for a given resource and then select the most appropriate one at runtime. Also, by default, the result of each processing chain is cached to disk on the first request. Unless the source XML or the stylesheets in the chain change, all subsequent requests are to be served from the cache. Figure 1-4 illustrates the processing flow for a resource with one associated processing chain consisting of two transformations.
Figure 1-5: Basic two-stage processing chain
In its design, AxKit implements a modular system that divides the low-level tasks required for serving XML data across a series of swappable component classes. For example, Provider classes are responsible for fetching the sources for the content and stylesheets associated with the current request, while Language modules implement interfaces to the various transformative processors. (You can find details of each type of component class in Chapter 8.) This modular design makes AxKit quite extensible and able to cope with heterogeneous publishing strategies. Suppose that some content you are serving is stored in a relational database. You need only swap in a Provider class that selects the appropriate data for those pages from the database, while still using the default filesystem-based Provider for static documents stored on the disk. Several alternative components of various classes ship with the core AxKit distribution, and many others are available via the Comprehensive Perl Archive Network. Often, little or no custom code needs to be written. You simply drop in the appropriate component and configure its options.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Installing AxKit
AxKit combines the power of Perl's rich and varied XML processing facilities with the flexibility of the Apache web server. Rather than implementing such an environment in a monolithic package, as some application servers do, it takes a more modular approach. It allows developers to choose the lower-level tools such as XML parsers and XSLT processors for themselves. This neutrality with respect to lower-level tools gives AxKit the ability to adapt and incorporate new, better performing, or more feature-rich tools as quickly as they appear. That flexibility costs, however. You will probably have to install more than just the AxKit distribution to get a working system.
To get AxKit up and running, you will need:
  • The Apache HTTP server (Version 1.3.x)
  • The mod_perl Apache extension module (Version 1.26 or above)
  • An XML parser written in Perl or, more commonly, one written in C that offers a Perl interface module
  • The core AxKit distribution
If you are running an open source or open source-friendly operating system such as GNU/Linux or one of the BSD variants (including Mac OS X), chances are good that you already have Apache and mod_perl installed. If this is the case, then you probably will not have to install them by hand. Simply make sure that you are running the most recent version of each, and skip directly to the next section. However, in some cases, using precompiled binaries of Apache and mod_perl proved to be problematic for people who want to use AxKit. In most cases, neither the binary in question, nor AxKit, are really broken. The problem lies in the fact that binaries built for public distribution are usually compiled with a set of general build arguments, not always well suited for specialized environments such as AxKit. If you find that all AxKit's dependencies install cleanly, but AxKit's test suite still fails, you may consider removing the binary versions and installing Apache and
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installation Requirements
To get AxKit up and running, you will need:
  • The Apache HTTP server (Version 1.3.x)
  • The mod_perl Apache extension module (Version 1.26 or above)
  • An XML parser written in Perl or, more commonly, one written in C that offers a Perl interface module
  • The core AxKit distribution
If you are running an open source or open source-friendly operating system such as GNU/Linux or one of the BSD variants (including Mac OS X), chances are good that you already have Apache and mod_perl installed. If this is the case, then you probably will not have to install them by hand. Simply make sure that you are running the most recent version of each, and skip directly to the next section. However, in some cases, using precompiled binaries of Apache and mod_perl proved to be problematic for people who want to use AxKit. In most cases, neither the binary in question, nor AxKit, are really broken. The problem lies in the fact that binaries built for public distribution are usually compiled with a set of general build arguments, not always well suited for specialized environments such as AxKit. If you find that all AxKit's dependencies install cleanly, but AxKit's test suite still fails, you may consider removing the binary versions and installing Apache and mod_perl by hand. At the time of this writing, AxKit runs only under Apache versions in the 1.3.x branch. Support for Apache 2.x is currently in development. Given that Apache 2 is quite different from previous versions, both in style and substance, the AxKit development team decided to take things slowly to ensure that AxKit for Apache 2.x offers the best that the new environment has to offer.
To install Apache and mod_perl from the source, you need to download the source distributions for each from http://httpd.apache.org/ and http://perl.apache.org/
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing the AxKit Core
Now that you have an environment for AxKit to work in and have some of the required dependencies installed, you are ready to install AxKit itself. For most platforms this is a fairly painless operation.
The quickest way to install AxKit is via Perl's Comprehensive Perl Archive Network (CPAN) and the CPAN shell. Log in as root (or become superuser) and enter the following:
$ perl -MCPAN -e shell
> install AxKit
This downloads, unpacks, compiles, and installs all modules in the AxKit distribution, as well as any prerequisite Perl modules you may need. If AxKit installs without error, you may safely skip to Section 2.4. If it doesn't, see Section 2.6 for more information.
The latest AxKit distribution can always be found on the Apache XML site at http://xml.apache.org/dist/axkit/. Just download the latest tarball, unpack it, and cd to the newly created directory. As root, enter the following:
 $ perl Makefile.PL
 $ make
 $ make test
 $ make install
This compiles and installs all modules in the AxKit distribution. Just like the CPAN shell method detailed above, AxKit's installer script automatically attempts to install any module prerequisites it encounters. If make stops this process with an error, skip on to Section 2.6 for help. Otherwise, if everything goes smoothly, you can skip ahead to Section 2.4.
In addition to the stable releases available from CPAN and axkit.org, the latest development version is available from the AxKit project's anonymous CVS archive:
cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic login
Brave souls who like to live on the edge or who may be interested in helping with AxKit development can check it out. When prompted for a password, enter: anoncvs. You may now check out a piping hot version of AxKit:
<![CDATA[cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic co xml-axkit]]>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing AxKit on Win 32 Systems
As of this writing, AxKit's support for the Microsoft Windows environment should be considered experimental. Anyone who decides to put such a server into production does so at her own risk. AxKit will run in most cases. (Win9x users are out of luck.) If you are looking for an environment in which to learn XML web-publishing techniques, then AxKit on Win32 is certainly a viable choice.
If you do not already have ActiveState's Windows-friendly version of Perl installed, you must first download and install that before proceeding. It is available from http://www.activestate.com/. I suggest you get the latest version from the 5.8.x branch. In addition, you need the Windows port of the Apache web server. You can obtain links to the Windows installer from http://httpd.apache.org/. Be sure to grab the latest in the 1.3.x branch. Next, grab the official Win32 binaries for libxml2 and libxslt from http://www.zlatkovic.com/libxml.en.html and follow the installation instructions there.
After you install Apache, Perl libxml2, and libxslt, you can install AxKit using ActiveState's ppm utility (which was installed when you installed ActivePerl). Simply open a command prompt, and type the following:
C:\> ppm
ppm> repository add theoryx http://theoryx5.uwinnipeg.ca/ppms
ppm> install mod_perl-1
ppm> install libapreq-1
ppm> install XML-LibXML
ppm> install XML-LibXSLT
ppm> install AxKit-1
Finally, add the following line to your httpd.conf and start Apache:
LoadModule perl_module modules/mod_perl.so
This combination of commands and packages should give you a workable (albeit experimental) AxKit on your Windows system. If things go wrong, be sure to join the AxKit user's mailing list and provide details about the versions of various packages you tried, your Windows version, and relevant output from your error logs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Basic Server Configuration
As you will learn in later chapters, AxKit offers quite a number of runtime configuration options that allow fine-grained control over every phase of the XML processing and delivery cycle. Getting a basic working configuration requires very little effort, however. In fact, AxKit ships with a sample configuration file that can be included into Apache's main server configuration (or used as a road map for adding the configuration directives manually, if you decide to go that way instead).
Copy the example.conf file in the AxKit distribution's examples directory into Apache's conf directory, renaming it axkit.conf. Then, add the following to the bottom of your httpd.conf file:
# AxKit Setup
Include conf/axkit.conf
You now need to edit the new axkit.conf file to match the XML processing libraries that you installed earlier by uncommenting the AxAddStyleMap directives that correspond to tools you chose. For example, if you installed libxslt and XML::LibXSLT, you would uncomment the AxAddStyleMap directive that loads AxKit's interface to LibXSLT. Example 2-1 helps to clarify this.
Example 2-1. Sample axkit.conf fragment
# Load the AxKit core.
PerlModule AxKit

# Associates Axkit with a few common XML file extensions
AddHandler axkit .xml .xsp .dkb .rdf 

# Uncomment to add XSLT support via XML::LibXSLT
# AxAddStyleMap text/xsl Apache::AxKit::Language::LibXSLT

# Uncomment to add XSLT support via Sablotron
# AxAddStyleMap text/xsl Apache::AxKit::Language::Sablot

# Uncomment to add XPathScript Support
# AxAddStyleMap application/x-xpathscript Apache::AxKit::Language::XPathScript

# Uncomment to add XSP (eXtensible Sever Pages) support
# AxAddStyleMap application/x-xsp Apache::AxKit::Language::XSP
The one hard-and-fast rule about configuring AxKit is that the PerlModule directive that loads the AxKit core into Apache via mod_perl must appear at the top lexical level of your httpd.conf file, or one of the files that it includes. All other AxKit configuration directives may appear as children of other configuration directive blocks in whatever way best suits your server policy and application needs, but the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Testing the Installation
AxKit's distribution comes with a fairly complete test suite that typically runs as part of the installation process. Running the make test command in the root of the AxKit source directory fires up a new instance of the Apache server on an alternate port with AxKit enabled. It then examines the output of a series of test requests made to that instance that exercise various aspects of AxKit's functionality. make test runs automatically by default if you are installing AxKit via the CPAN shell. If all test scripts pass during the make test process, you can be sure that you have a working AxKit installation and are ready to proceed.
In addition to the automated test suite, AxKit comes with a set of demonstration files that you can also use to test your new installation. To install the demo, copy the demo directory and its contents from the root of the AxKit distribution into an appropriate directory to which you have write access. The configuration file in the demo directory presumes that you will copy the demo directory into /opt/axkit. So if you choose another location, be sure to edit all paths in the demo's axkit.conf file to reflect your choice.
Before the demo will work, you need to include the axkit.conf contained in the new demo directory into your server's httpd.conf file. For example, if you installed the demo in /opt/axkit (again, the default), you would add the following:
# AxKit Demo
Include /opt/axkit/demo/axkit.conf
Start (or stop and restart) the Apache server and point a browser to http://localhost/axkit/. You should see a page congratulating you on your new AxKit installation. This page also presents a number of links that allow you to test AxKit's various moving parts. For example, if you chose to install libxslt and its Perl interface XML::LibXSLT to use as an XSLT processor, you would click on the XSLT demos, using the XML::LibXSLT link to verify that AxKit works and is configured properly to those libraries to transform XML documents, as shown in Figure 2-1.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installation Troubleshooting
As I mentioned in this chapter's introduction, AxKit's core consists largely of code that glues other things together. In practice, this means that most errors encountered while installing AxKit are due to external dependencies that are missing, broken, out of date, or invisible to AxKit's Makefile. Including a complete list of various errors that may be encountered among AxKit's many external dependencies is not realistic here. It would likely be outdated before this book is on the shelves. In general, though, you can use a number of compile-time options when building AxKit. They will help you diagnose (and in many cases, fix) the cause of the trouble. AxKit's Makefile.PL recognizes the following options:
DEBUG=1
This option causes the Makefile to produce copious amounts of information about each step of the build process. Although wading through the sheer amount of data this option produces can be tedious, you can diagnose most installation problems (missing or unseen libraries, etc.) by setting this flag.
NO_DIRECTIVES=1
This option turns off AxKit's apache configuration directives, which means you must set these via Apache's PerlSetVar directives instead. Use this option only in extreme cases in which AxKit's custom configuration directives conflict with those of another Apache extension module. (These cases are very rare, but they do happen.)
EXPAT_OPTS=" . . . "
This option is relevant only if you do not have the Expat XML parser installed and decide to install it when installing AxKit. This argument takes a list of options to be passed to libexpat's ./configure command. For example, EXPAT_OPTS="--prefix=/usr" installs libexpat in /usr/lib, rather than the default location.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Your First XML Web Site
With AxKit installed, you can begin putting it though its paces. In this chapter, we create a simple XML-based web site. Along the way, I will introduce AxKit's facilities for how to apply stylesheets to transform data marked up in XML into a commonly used delivery format, how to combine XML from different sources, and how to configure an alternate style transformation to deliver the same XML content in a different format in response to data received from the requesting client.
By design, XML processing tools are less forgiving about what they accept than the HTML browsers that you may be used to working with. Omitting a closing tag when creating an element in an HTML page, for example, may cause an undesirable result when the page is rendered, but the browser usually tries to recover gracefully and render something for you to see. In contrast, omitting an end tag when creating an element in a document that an XML parser will consume results in a fatal well-formedness error, and no such recovery is possible. In the context of AxKit (in which all XML processing happens on the server), this means that if you pass in a bad document, AxKit sends no content to the client. At best, you see an error message that indicates where things went wrong. To avoid frustration, take a little time to familiarize yourself with the XML processing tools available to you. At the very least, investigate how the XML parser you installed can be used from the command line to verify a document's well-formedness and validity. Being able to catch bad documents going in reduces the overall number of potentially user-visible errors. The ability to verify that your content and stylesheets are at least syntactically correct can make finding the cause of an error easier.
Even more than with a static HTML-based site, starting with a good directory structure is key to creating an easy-to-maintain XML-based site. The time and labor-saving benefits of having predictable paths for images, CSS stylesheets, etc. also apply to the files associated with XML processing. It's a good idea to get in the habit of creating a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Preparation
By design, XML processing tools are less forgiving about what they accept than the HTML browsers that you may be used to working with. Omitting a closing tag when creating an element in an HTML page, for example, may cause an undesirable result when the page is rendered, but the browser usually tries to recover gracefully and render something for you to see. In contrast, omitting an end tag when creating an element in a document that an XML parser will consume results in a fatal well-formedness error, and no such recovery is possible. In the context of AxKit (in which all XML processing happens on the server), this means that if you pass in a bad document, AxKit sends no content to the client. At best, you see an error message that indicates where things went wrong. To avoid frustration, take a little time to familiarize yourself with the XML processing tools available to you. At the very least, investigate how the XML parser you installed can be used from the command line to verify a document's well-formedness and validity. Being able to catch bad documents going in reduces the overall number of potentially user-visible errors. The ability to verify that your content and stylesheets are at least syntactically correct can make finding the cause of an error easier.
Even more than with a static HTML-based site, starting with a good directory structure is key to creating an easy-to-maintain XML-based site. The time and labor-saving benefits of having predictable paths for images, CSS stylesheets, etc. also apply to the files associated with XML processing. It's a good idea to get in the habit of creating a stylesheets (or similarly named) directory at the base of the host's DocumentRoot when you start a new project.
If you installed the AxKit demonstration site or included the sample axkit.conf in your main Apache configuration (covered in Section 2.4), you do not need to alter the web server's configuration at all. If not, follow the directions there, or add the following lines to the web server's httpd.conf,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating the Source XML Documents
Often, many benefits of using an XML publishing framework such as AxKit become obvious only later in a project's life (e.g., the ability to easily add new heavy-duty features to an existing site, or the power to completely change the look and feel of an entire site without touching its content). Given this, any examples you may choose for this introduction will surely fall short of illustrating AxKit's real power. Accepting the notion that the task at hand is a bit absurd frees you to have a little fun with it while still learning the basics. Let's run with the absurdity, and imagine that you are charged with the task of publishing a small site on the very silly subject of cryptozoology.
Cryptozoology (literally, the study of hidden animals) is concerned with the gathering and analysis of data related to animals that are frequently reported by local residents or found in popular folklore, but whose existence the scientific community has not formally recognized. Familiar examples include the Yeti, Loch Ness Monster, and Mokele-Mbembe.
The first document for your site, cryptozoo.xml, contains a list of cryptozoological species (called cryptids by insiders). (See Example 3-1.)
Example 3-1. cryptozoo.xml
<?xml version="1.0"?>
<cryptids>
  <species>
    <name>Jackalope</name>
    <habitat>Western North America</habitat>
    <description>
      <para>
        Similar to the Bavarian raurackl (stag-hare), the
        North American Jackalope resembles a large jackrabbit
        with small, deer-like antlers. This vicious
        carnivore is frequently mistaken for common rabbits or hares
        suffering from <italic>papillomatosis</italic> (a condition
        that produces horn-like growths on the head in those species).
      </para>
    </description>
  </species>
  <species>
    <name>Dahut</name>
    <habitat>French Alps</habitat>
    <description>
      <para>
        A shy relative of the Alpine deer, the Dahut has
        adapted to the challenges of its mountainous habitat by
        growing legs that are considerably longer on one side
        of its body. While this asymmetrical limb configuration allows
        for level grazing on steep grades, it leaves the unfortunate
        creature unable to reverse its course. Local hunters exploit
        this weakness by sneaking up behind the Dahut and either
        whistling softly or crying "Dahut!"; when the startled
        creature turns to face its assailant, it finds its
        longer legs on the wrong side and it tumbles to it doom.
      </para>
    </description>
  </species>
  <!--  . . . more species here -->
</cryptids>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Writing the Stylesheet
So far, you have three XML documents that contain three very different, but randomly overlapping, grammars. (The species and name elements appear in different roles in the two main content documents.) Your goal is to make this information available on the Web to HTML browsers. You want to reach the widest possible audience, and that means maintaining the lowest possible expectations of the requesting client's capabilities. That is, you cannot rely on everyone who wants to read your pages having a thoroughly modern browser capable of doing appropriate client-side transformations to your XML documents via CSS or XSLT. You must deliver basic HTML if you expect your data to be widely accessible.
With this in mind, you need a way to transform the disparate data structures contained in each of your XML documents into the unified grammar of simple HTML.That's where AxKit's transformational languages and stylesheets enter the picture. AxKit offers many ways to transform XML data. (We will examine the merits of many of these in later chapters.) In this example, we examine how you can transform your cryptozoology documents into HTML using two of the more popular transformation languages: XSLT and XPathScript.
I will save the examination of the lower-level details of these languages for later. At this point, it suffices to understand that both XSLT and XPathScript offer a declarative syntax that provides a way to create new documents by applying transformations to all or some of the elements, attributes, and other content that an existing XML document contains.
Rather than taking small steps through the XSLT stylesheet, I present it here in one block to give you an idea of what a full, working stylesheet looks like. (See Example 3-4.) Do not worry if much of it seems foreign; we will look at the syntactic elements of XSLT in more detail in Chapter 5.
As you read through the stylesheet, keep in mind that:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Associating the Documents with the Stylesheet
AxKit offers a variety of configuration options for associating documents with its various language processors. Chapter 4 covers each in detail. In Example 3-5, you create an .htaccess file in the same directory as your XML documents. It defines a default style for AxKit to use when processing documents in this directory.
Example 3-5. A simple .htaccess file
<AxStyleName "#default">
  AxAddProcessor text/xsl stylesheets/cryptozoo.xsl
</AxStyleName>
Pay attention to the arguments passed to the AxAddProcessor directive. The first is the MIME type that AxKit examines to decide which language processor modules to use, and the second is the DocumentRoot-relative path to the stylesheet that will be passed to that language processor to transform your XML documents. If you want to use your XPathScript stylesheet rather than the XSLT, you would use AxAddProcessor application/x-xpathscript stylesheets/cryptozoo.xps instead. This processor definition is wrapped in an AxStyleName block. This directive block, in turn, combines the processor definitions it contains into a single "named style" that a StyleChooser or other plug-in can select at runtime. By giving this style the special name #default, you are configuring AxKit to use this style as a fallback if no other style is explicitly selected.
It's time to fire up a web browser and check the results of your work. A request to http://myhost.tld/cryptozoo.xml yields what is shown in Figure 3-1.
Figure 3-1: cryptozoo.xml rendered as HTML
Clicking on the Sightings link reveals what is shown in Figure 3-2.
Figure 3-2: cryptid_sightings.xml rendered as HTML
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Step Further: Syndicating Content
You have reached your initial goal of publishing your XML documents for consumption by HTML browsers on the Web using AxKit. Even if that were all you ever wanted to do, you still made a clear division between the content you will maintain and the way in which it is presented. Among other benefits, you can now redesign the look and feel of pages sent to the client without touching content documents. Don't worry about clobbering or obscuring essential information just to change the way it renders in a browser. Similarly, using a custom XML grammar for your content means that the documents themselves can unambiguously define the intended roles of the data they contain, rather than the way that data may be represented on the visual medium of an HTML browser. This makes reusing the data for other purposes a lot easier.
To understand the practical benefits of separating content from presentation, suppose that your list of cryptid sightings becomes wildly popular on the Web. People start asking for a way to put links to the newly reported sighting on their own cryptozoology sites. You could tell them to screen-scrape the HTML list. Instead, you decide to be a good information-sharing citizen and make the list available as an RSS syndication feed. To achieve this, the first thing you need is a stylesheet that transforms the list of cryptid sightings to RSS, in addition to the one you already have that transforms the data into HTML.
For those who may not be familiar with it, RSS (RDF Site Summary, Rich Site Summary, or Really Simple Syndication, depending on whom you ask) is a popular XML grammar used for syndicating online content, especially news headlines. Most weblogs use RSS as the means to both publish content and share links with other bloggers, and many weblog tools store their data natively as RSS. (See Example 3-6.) For more information about RSS and some of its more creative uses, see Ben Hammersley's Content Syndication with RSS (O'Reilly).
Example 3-6. cryptidsightings_rss.xsl
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Points of Style
Style is an important concept in the AxKit world. Much of the value that AxKit adds as an XML publishing and application server lies in the flexibility and ease with which documents can be associated with one or more sets of transformations that can be selected dynamically in response to a condition. From more common web-publishing tasks, such as vendor co-branding and user customization, to more advanced dynamic data-oriented applications, the key to developing clean, manageable sites with AxKit lies in learning how to apply the appropriate styles to the content to meet the specific need. In this chapter, I introduce AxKit's basic styling configuration options and show how these options can be combined to create sophisticated, responsive sites.
Before you can begin associating documents with the stylesheets that will be used to transform them, you must tell AxKit which lower-level processors to use to perform those transformations. In AxKit, access to various transformative processors is provided by its Language modules. Many of these modules create a bridge between AxKit and an existing XML processing tool. For example, the Apache::AxKit::Language::LibXSLT module allows AxKit access to Perl's XML::LibXSLT interface and, hence, to the Gnome project's XSLT processing library, libxslt. Other Language modules, such as AxKit's implementation of eXtensible Server Pages, ApacheAxKit::Language::XSP, are unique to AxKit and implement both the interface that allows it to be added to the AxKit processing chain and the code that actually processes the XML content. The core AxKit distribution contains several such Language modules:
Apache::AxKit::Language::LibXSLT
Adds support for the Gnome Project's libxslt processor; used to transform documents with XSLT
Apache::AxKit::Language::Sablot
An alternate XSLT transformer using the Sablotron XSLT processor from The Ginger Alliance
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Adding Transformation Language Modules
Before you can begin associating documents with the stylesheets that will be used to transform them, you must tell AxKit which lower-level processors to use to perform those transformations. In AxKit, access to various transformative processors is provided by its Language modules. Many of these modules create a bridge between AxKit and an existing XML processing tool. For example, the Apache::AxKit::Language::LibXSLT module allows AxKit access to Perl's XML::LibXSLT interface and, hence, to the Gnome project's XSLT processing library, libxslt. Other Language modules, such as AxKit's implementation of eXtensible Server Pages, ApacheAxKit::Language::XSP, are unique to AxKit and implement both the interface that allows it to be added to the AxKit processing chain and the code that actually processes the XML content. The core AxKit distribution contains several such Language modules:
Apache::AxKit::Language::LibXSLT
Adds support for the Gnome Project's libxslt processor; used to transform documents with XSLT
Apache::AxKit::Language::Sablot
An alternate XSLT transformer using the Sablotron XSLT processor from The Ginger Alliance
Apache::AxKit::Language::XPathScript
Adds support for a more Perlish alternative to XSLT, XPathScript
Apache::AxKit::Language::XSP
Provides an interface to AxKit's implementation of the eXtensible Server Pages (XSP)
Apache::AxKit::Language::SAXMachines
Provides an AxKit interface to Barrie Slaymaker's popular XML::SAX::Machines Perl module, which offers an easy way to set up chains of SAX Filters to transform XML content
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Defining Style Processors
The most basic component used to define which transformations will be applied to a given resource within AxKit is perhaps best termed a style processor definition. These definitions indicate a single transformational step (applying an XSLT stylesheet or passing the content through a SAX Filter, for example) in what may be a chain of transformations. In their most basic and typical form, these style definitions declare two bits of crucial information: a MIME type that will be used by AxKit to determine which Language module will be used to transform the content, and a file path to a stylesheet (or other Language-specific argument) that will be used by that Language module to determine how to transform the content. Individual processor definitions may optionally be combined into named style and media groups that can then be selected conditionally, based on a number of factors.
By default, style processors are defined within AxKit in one of two ways: by using AxKit's processor configuration directives or by special stylesheet processing instructions contained in the source documents themselves. (In Chapter 8, you'll learn to create your own way to configure AxKit's styling rules by rolling a custom ConfigReader module, but that's another story.) The following illustrates how to create a simple named style containing a single style definition using AxKit's server configuration directives. The lone processor definition contains the required MIME type and path to the stylesheet that will be applied.
<AxStyleName "#default">
    AxAddProcessor text/xsl /path/to/style1.xsl
</AxStyleName>
The MIME type used in the processor definition corresponds to the one you associated earlier with the Language::LibXSLT module. The effect of this is that, starting with the source, XML will be transformed by the LibXSLT processor by applying the stylesheet at the location defined by the second argument.
AxKit offers a small host of runtime configuration directives that can be used to define the styles for a given site. These directives are actually extensions of Apache's configuration syntax and, as such, are added to the usual server configuration files, such as
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Dynamically Choosing Style Transformations
The options that you've looked at so far for associating documents with transformative processes are quite capable. Often, the use of stylesheet processing instructions, or simple AxKit processor definitions, combined with constraints imposed by Apache built-in <Files>, <Directory>, <Location>, and similar block-level directives, are all you need to meet the needs of many sites. However, AxKit offers even more flexibility by providing additional mechanisms that allow you to combine these low-level style processing options into logical groups that can be selected at runtime. In this section, I introduce the concepts and syntax for creating named styles and media types and explain how these can be used in conjunction with the StyleChooser and MediaChooser modules to apply just the right content transformations under the right circumstances.
The reasons for using named style and media blocks are quite varied. Generally, they are best suited for cases when you need to select a transformation (or a chain of transformations) based on a condition external to the properties of the source XML content itself. Some reasons to use named styles and media blocks include:
Vendor branding
Your site offers a service, and each customer wants the content presented in a way that matches his unique look and feel.
User-selected skinning
One size never fits all. You want to offer your visitors the ability to select the style that suits them best.
Automated metadata extraction
You want to extract important metadata, like abstract summaries, author, title, copyright information, etc., by simply applying an alternative set of styles to your documents.
Role-specific data transformations
Your customers, vendors, shipping department, and company president all have different needs when they look at your product list. You want to serve all of them from the same XML data source.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Style Processor Configuration Cheatsheet
So far, we have looked at the various individual elements that can be used to control how AxKit applies style transformations. So many, in fact, that seeing how these parts all fit together may be a bit tough. For example, a named style block (selected by a StyleChooser based on an environmental condition) may contain one or more AxAddDTDProcessor or similar conditional processing directives that are only applied if an additional condition is met. True AxKit mastery comes from knowing how to combine all its various configuration options to create elegant styling rules that meet the need of your specific application.
To help examine the processing order for various configuration combinations, we will create a series of very simple XSLT stylesheets whose sole purpose is to show the order in which AxKit applies a given style. The stylesheet in Example 4-2, alpha.xsl, simply appends the string . . . Alpha processed to the text of the top-level root element of the document being processed.
Example 4-2. alpha.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  version="1.0">

<xsl:template match="/">
  <root><xsl:value-of select="/*"/> . . . Alpha processed</root>
</xsl:template>

</xsl:stylesheet>
The tiny sample XML document used for the transformations is shown in Example 4-3.
Example 4-3. minimal.xml
<?xml version="1.0"?>
<root>Base content</root>
Add more stylesheets to these—beta.xsl, gamma.xsl, and so on—that do more or less the same thing—that is, adding . . . Beta processed, etc., to the text of the root element. Wherever a simple description does not suffice, use these stylesheets to examine the precise processing order based on the returned result.
Wherever more than one style processing directive exists within a given context, each will be added (or, in the case of conditional processors, evaluated) in the order in which they appear in the configuration files. In cases in which directives are added to the same context by both the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Transforming XML Content with XSLT
XSLT (The eXtensible Stylesheet Language: Transformations) is an XML application language used for transforming XML documents into other documents. It is implemented by an application called an XSLT processor that takes an XML document and an XSLT stylesheet as input and produces a new document by applying the instructions contained in the stylesheet to the original source XML document. The result of an XSLT transformation can be any text-based format, but the output is typically either another XML document, or a document in a widely deployed markup language such as HTML that can be readily consumed by a given client application.
AxKit is not an XSLT processor, nor does it ship with one. If you want to use XSLT to transform your XML content using AxKit, you need to install an XSLT processor and any necessary Perl interface modules separately. For the list of XSLT processors that AxKit currently supports, see XML Processing Options in Chapter 2. For details about how to use the AxAddStyleMap directive to associate XSLT stylesheets with the processor you install and the various directives that govern which stylesheets are applied to your XML documents, see Chapter 4.
Exhaustive coverage of XSLT is well beyond the scope of this chapter. The goal here is to introduce enough of the basic concepts of writing XSLT stylesheets to allow you to start being productive with AxKit as quickly as possible. For a more detailed look at XSLT, see XSLT, by Doug Tidwell, or Learning XSLT, by Mike Fitzgerald (both from O'Reilly). All of the samples here use only XSLT 1.0. At the time of this writing, XSLT 2.0 is still very new and not widely implemented, and existing implementations are highly experimental. Rest assured though, XSLT 1.0 is still a viable tool. The topics covered here generally apply to both versions, and support for use of Version 2.0 processors from within AxKit will be added just as soon as stable implementations begin to appear.
An XSLT stylesheet is made up of a single top-level
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XSLT Basics
An XSLT stylesheet is made up of a single top-level xsl:stylesheet element that contains one or more xsl:template elements. These templates can contain literal elements that become part of the generated result, functional elements from the XSLT grammar that control such things as which parts of the source document to process, and often a combination of the two. The contents of the source XML document are accessed and evaluated from within the stylesheet's templates and other function elements using the XPath language. The following shows a few XSLT elements (the associated XPath expression is highlighted):
<xsl:value-of select="price"/>
<xsl:apply-templates select=" /article/section"/>
<xsl:copy-of select="order/items"/>