BUY THIS BOOK
Add to Cart

Print Book $34.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £24.95

What is this?

Looking to Reprint this content?

XSL-FO
XSL-FO Making XML Look Good in Print

By Dave Pawson
Price: $34.95 USD
£24.95 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Planning for XSL-FO
XSL-FO is a terrific technology for creating paginated print versions of information contained in XML documents, but it is only one ingredient in the overall information-publishing recipe. Deciding whether XSL-FO suits your needs and choosing which XSL-FO tools to use are first steps toward implementing applications of XSL-FO.
If you already have information stored in XML that you need to publish and an XSL-FO toolkit you're comfortable with, you might want to go on to the next chapter.
Individuals and organizations who need print output from computer-based content have many choices. Typically, these range from basic text editors through to high-end word processors available to most, via office suites. The high end of non-specialist tools is probably a desktop publishing package available for a few hundred dollars. This stretches the capabilities of the casual user, introducing concepts not available to word processor users. Within this toolset, the quality of output is generally sufficient for a large percentage of the documents that we see. Nevertheless, these tools have several important drawbacks.
The limits appear rapidly as the importance of volume, print quality, layout options, repeatability, and document organization increases. Within each of these areas, the effort needed to attain a desired output increases as more features are sought. When these limits are reached, organizations either outsource the work to professional printers or bring skills and an appropriate toolset in-house. The deciding factors vary between documents, users, financial limitations, the frequency of need, and accurate growth forecasts.
One key aspect of this decision — perhaps a sign that XSL-FO is appropriate — is whether repeatability is an issue. When a document is produced regularly, it becomes familiar in certain ways; its look and feel become recognizable. We may not be able to say exactly what those elements are, but if the magazine, report, or manual fails to align with style expectations, it is noticed. The content changes with each new issue, but the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XML and Document Processing
Individuals and organizations who need print output from computer-based content have many choices. Typically, these range from basic text editors through to high-end word processors available to most, via office suites. The high end of non-specialist tools is probably a desktop publishing package available for a few hundred dollars. This stretches the capabilities of the casual user, introducing concepts not available to word processor users. Within this toolset, the quality of output is generally sufficient for a large percentage of the documents that we see. Nevertheless, these tools have several important drawbacks.
The limits appear rapidly as the importance of volume, print quality, layout options, repeatability, and document organization increases. Within each of these areas, the effort needed to attain a desired output increases as more features are sought. When these limits are reached, organizations either outsource the work to professional printers or bring skills and an appropriate toolset in-house. The deciding factors vary between documents, users, financial limitations, the frequency of need, and accurate growth forecasts.
One key aspect of this decision — perhaps a sign that XSL-FO is appropriate — is whether repeatability is an issue. When a document is produced regularly, it becomes familiar in certain ways; its look and feel become recognizable. We may not be able to say exactly what those elements are, but if the magazine, report, or manual fails to align with style expectations, it is noticed. The content changes with each new issue, but the house style becomes established. In some cases, the house style is dictated by simple description: "The editorial cannot be more than 200 words." "We always have Anne's piece here." This repeatability and regularity form a key to processing and begin to drive input needs.
If you regularly read a report or newspaper, you begin to know what to expect where. This is one aspect of style as it applies to document preparation.
Styles need to be flexible, however. A common example of necessary flexibility is media creep. Someone may want to add another medium. A print document is no longer adequate, and the toolset that has been good enough for a print media is suddenly required to produce a web version, a version on compact disk, or an alternative media accessible to nonprint users. This brings a critical question. Do we ask our present toolkit to produce this? Often, the answer should be no, though it may take a long time to come to this realization. Tools designed for one media show their heritage when applied to other media.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Choosing Your Print Production Approach
When you create selection criteria, you should address the following questions. Is XML input available? What access do you have to expertise in any of these areas? What access do you have to other organizations that have chosen that particular path, and how well do their needs match yours? Is expertise available locally, or can you afford to import it? What are the timescales of the investment: are you expecting to use this toolset for a significant period or simply to meet a short term need? What payback period are you allowed for such an investment? If a particular toolset is used, how will it fit in with other tools and technology that you already use? Are your print processes isolated or part of a larger publishing process? Will you fully own the process, or will some elements be outsourced, for instance, initial markup or final printing? If so, are the interfaces known and understood? What transformations (if any) are required as part of this process? For any particular toolchain, is there a good match with the personnel involved? How readily will they accept the new tools and the associated training? Is training readily available?
Your particular answers to these questions are first steps toward addressing print production concerns.
So when is XSL-FO a good choice? What can it provide that other tools can't? The primary benefit is its place as an XML language that enables the use of the increasing number of XML tools. XSL-FO takes XML as its input, and delivers print, today most commonly in Adobe's Portable Document Format (PDF) or PostScript. Microsoft's Rich Text Format (RTF) is also being targeted as a final form, with two implementations available. In between XML input and print is an intermediate document in the fo namespace. Future implementations may indeed provide other delivery forms as an endpoint in an XML-based toolchain in today's organizations. The FO vocabulary is primarily for the implementors and, in the future, may even become an invisible stage (to the end user) as more graphical tools become available.
XSL-FO has natural allies in XSLT and XPath, which were developed with XSL-FO. These two are widely implemented and perform the content selection that is a part of the final form generation. The combined power of these is enormous and still under-appreciated.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Choosing Tools
Choosing XSL-FO processors is still difficult. Although work on some of the processors has been underway for years, the Recommendation only became final in December 2001 and there were substantial changes along the way. You'll want to inspect tools closely and try them out if possible.
If, for example, you already use the Epic editor and wish to produce output using XSL-FO, that could be the perfect choice. If your present processes leave you more room for choice, know which criteria must be met, should be met, and what are nice to have?
Next, ask yourself a few questions to further narrow the selection. What expertise do you have to apply? What level of support do you need from the supplier? What development options do you want, perhaps extending the formatter to account for your peculiar needs? Are you in a position to take what's given and use it within today's performance envelope? How simple are your needs?
The more straightforward your print requirements the wider your choice. Are you in a position to use one of the open source developments, adding to that formatter as your needs dictate? If you don't have the expertise in-house, might you buy it? The number of proficient stylesheet authors around the globe is unlikely to exceed the low hundreds and their availability for an in-house contract is questionable. Will remote support satisfy your needs? Can you negotiate a contract that includes updates for the initial period while the specification settles and interpretations are made public? These products are not necessarily complete yet. You will need someone capable of assessing each update.
The following sections discuss a couple of further issues to consider in detail.

Section 1.3.1.1: Price

Price is always a primary determining factor. Whose money are you spending: your own, your employer's, or your clients'? There are currently three commercial implementations with support. These are the most complete implementations. More partial implementations are available in open source form.
In any event, the development of a formatter is not trivial. The people involved in that work have expended a tremendous amount of effort in developing those products, so freely available or paid for, please don't ever think of them as cheap products. On the other hand, this is not a market in which you necessarily get what you pay for. Assess the product in terms of its capability, not its price.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Future for XSL-FO
Like all other technologies, the success or failure of XSL-FO will be determined by user uptake and demand, implementation response to that demand, and so on. A recent development that I found hopeful was the start of work on more than one implementation of XSL-FO with RTF format being the target. Because Microsoft Word is so widely in use, the availability of that specific format has potential importance in terms of numbers and interest — perhaps not in the commercial sphere, but more in the home or office environment. I have no idea what will be the make or break points in the development of XSL-FO, but the option to produce Word documents for the office environment could be one of those.
The present focus of using PDF as the deliverable format is pragmatic. PDF is one of a small number of formats that has been widely deployed, is readily available, is well known and has the capability of browser integration for web delivery. Whether future implementations will maintain that pragmatic focus, I don't know, but alternatives are not abundant. Few, if any, typesetting languages have been opened up to exploitation in this field, perhaps with the exception of TEX, with its target of electronic typesetting. Perhaps the advent of electronic paper (rewritable sheets of a plastic) will be a natural media for XSL-FO.
The need for paper-based delivery is, today, not in question. How that will be achieved in a multimedia-capable organization in a few years is still open to debate. Will XSL-FO be a preferred part of the delivery chain? What will help and hinder in making that choice? Tool availability, yes. Familiarity or access to the skills to develop the stylesheets? Yes, or maybe not. If the sort of visual tool that allows me to paint styles onto content becomes available, it should be possible to autogenerate the bulk of a stylesheet. Whether the impetus will be felt to develop such a tool depends on whether there's a market for it. One of the fascinating developments in the history of XSL is transformation. Once it became known that XSLT and XPath could produce HTML from XML, that swiftly overtook the original intention. Such a twist of fate has surprising impact. What other factors are likely to move XSL-FO into widespread use? Support networks? XSLT is extremely well supported via the Mulberrytech mailing list. One of the XML Usenet groups just about splits evenly between XML and XSLT. XSL-FO has a single, quiet list. Newsworthiness? XSL-FO is nowhere near as sexy as XML (either that or it doesn't have the support of people who are good at hyping a technology), which could influence its fate. We are unlikely to see XSL-FO streams in the mainstream conferences unless someone does some serious marketing.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: A First Look at XSL-FO
This chapter introduces the details of XSL-FO, including a look at XSL-FO markup and an explanation of how to produce print documents with XSLT and XSL-FO. You should have a basic understanding of XSL-FO processing by the end of the chapter, which will provide a foundation for learning the rest of XSL-FO.
This section provides a high level view of XSL-FO and its major parts, describes the process of getting from source to finished output, and describes some of the available tools. It introduces some of the necessary concepts (which will be expanded on later) and some of the jargon.
The production process starts with an XML document that you have been given or that you have created: the source XML. You take that document and apply an XSLT transformation (using an XSLT stylesheet) to select parts or all of the document content, and it produces an output XML document that uses the XSL-FO vocabulary. Let's call this output document the XSL-FO stylesheet. The XSL-FO stylesheet formatting instructions describe how the content of the document should be laid out for presentation to the end user. The formatting engine interprets the XSL-FO stylesheet to produce formatted output, often PDF, TEX, or some other print-ready form. This formatted document is then ready for use. This end-to-end process is shown in Figure 2-1.
Figure 2-1: The end-to-end process
Making this work requires some means of creating XML documents, an XSLT processor, and an XSL-FO formatter to produce the printer ready output. This may be a command-line tool, part of an editing suite, or a graphical user interface-based tool. This formatter needs the XSL-FO document as its input and produces some form of printable output. The only other tool you will need is a printer (or similar output device) if you want paper-based output.
You should use XSLT to generate your XSL-FO from source documents (described later in this chapter). To do that, however, you need to have some idea of what XSL-FO documents look like, so we'll start by looking at the result XSL-FO documents.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
An XSL-FO Overview
This section provides a high level view of XSL-FO and its major parts, describes the process of getting from source to finished output, and describes some of the available tools. It introduces some of the necessary concepts (which will be expanded on later) and some of the jargon.
The production process starts with an XML document that you have been given or that you have created: the source XML. You take that document and apply an XSLT transformation (using an XSLT stylesheet) to select parts or all of the document content, and it produces an output XML document that uses the XSL-FO vocabulary. Let's call this output document the XSL-FO stylesheet. The XSL-FO stylesheet formatting instructions describe how the content of the document should be laid out for presentation to the end user. The formatting engine interprets the XSL-FO stylesheet to produce formatted output, often PDF, TEX, or some other print-ready form. This formatted document is then ready for use. This end-to-end process is shown in Figure 2-1.
Figure 2-1: The end-to-end process
Making this work requires some means of creating XML documents, an XSLT processor, and an XSL-FO formatter to produce the printer ready output. This may be a command-line tool, part of an editing suite, or a graphical user interface-based tool. This formatter needs the XSL-FO document as its input and produces some form of printable output. The only other tool you will need is a printer (or similar output device) if you want paper-based output.
You should use XSLT to generate your XSL-FO from source documents (described later in this chapter). To do that, however, you need to have some idea of what XSL-FO documents look like, so we'll start by looking at the result XSL-FO documents.
The XSL-FO document specifies page layout, page size, any headers and footers, margins and page numbers, etc. For example, the page specifications may be for A4 pages (or U.S. letter pages) of a certain height and width. The title page may be specified separately from the main content. Other pages may need separate specification. The bulk of the content of the document is likely to have a common layout. Any appendixes may need page numbers with letter prefixes, for instance, page A1 for the first page of Appendix A. You can do all this using the page specifications.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Related Stylesheet Specifications
While XSL is a powerful set of formatting tools, it is far from the only option. XSL comes from a rich heritage of stylesheet development, and can be used with or in place of these technologies.
One of the originators of the W3C submission was James Clark, the initial editor of the W3C document. He worked on the Document Style Semantics and Specification Language (DSSSL) standard. The ISO/IEC 10179 document states that DSSSL provides the specification of document processing for two purposes:
1. The transformation language for transforming SGML documents marked up in accordance with one or more DTDs into other SGML documents marked up in accordance with other DTDs . . . .
2. The style language, where the result is achieved by applying a set of formatting characteristics to portions of the data, and the specification is, therefore, as precise as the application requires, leaving some formatting decisions, such as line-end and column-end decisions, to the composition and layout process.
From this, it's quite clear that XSL-FO falls into the second group, that of specifying the formatting of documents. DSSSL was designed with SGML in mind, whereas XSL-FO had XML in mind. The experience of DSSSL was a key input to XSL-FO.
You may be asking the obvious by now. Why not DSSSL? Why XSL-FO? Two very clear reasons are implementation and support. DSSSL has a small following for its single open source implementation, OpenJade, produced by a group of faithfuls who took up the development of Jade when James Clark ceased its development. The one commercial implementation is from Nextsolution (http://www.nextsolution.co.jp/English/index.html). They recently announced the release of Version 2.0. Their initial release followed Jade in 1998.
The one key advantage DSSSL has over the XSLT/XSL-FO combination is its full programming language support, often a complaint about XSLT. The Jade implementation is based on Scheme, one of the Lisp family of languages. (While it provides all the functionality of a full programming language, Scheme is not an especially popular language with the users of XML.) Another plus on the DSSSL side is that can produce Rich Text Format (RTF) as an output, as used in Microsoft Word. The downside to DSSSL is its limited implementation. OpenJade has not added sufficiently to the original product to make it comprehensive in its capabilities. A series of limitations, combined with a steep learning curve, have deterred many people. DSSSL has very few tutorials and a specification written for implementors rather than users.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using XSL-FO as Part of XSL
This section looks at the integration of XSLT and XSL-FO. The two recommendations started out as one and, rightly, have a close relationship. I make the assumption that the reader has some background in XSLT.
An XSLT transformation defines a mapping from the source document structures to XSL-FO formatting. When run, the XSLT transformation produces an XSL-FO document that is then run through a formatter. Tools can combine these two steps either overtly or behind the scenes, but it's worth understanding what happens under the hood. The advantage of this two-stage approach is that content selection can take place in the first stage. Certain parts of the source XML document may not be wanted in the final printed form. These can be ignored by this first stage. In the same way, literal content can be added by the stylesheet (to save the XML source document author having to retype a long company name, for instance), that is then output into the XSL-FO document, along with content from the source document, and that becomes a part of the final presentation.
XSL and XSL-FO have suffered from some naming confusion, largely because of history in the W3C. Initially, what we are calling XSL-FO was simply XSL, the Extensible Stylesheet Language. It became apparent that two-stage processing of SGML or XML into a print format was necessary. The prevailing view was that these two stages should be combined into a single W3C recommendation. This was proposed to the W3C, and XSL was born. When James Clark first released a product based on the working draft of this recommendation, its immediate use was for a slightly different purpose.
Remember I said that the transformation from XML into XSL-FO was done by XSLT? Initially, it was done by what was then called XSL. It soon became obvious that XSL had a very clear and quite large market using the transformation aspect to take one XML document through to another XML (or XHTML) format. This usage was well received, as people began using XSLT to transform XML into web-viewable HTML, XHTML, or WML (for mobile phones). Indeed, having realized that this general transformation capability was extremely useful, many people simply started to ignore the original purpose of XSL and demanded more features in this transformation area. User demand to speed up the delivery of the transformation side, at the expense of the formatting side, increased to the point where the Working Group accepted the inevitable and split out what are now XSLT and XPath from what remained XSL. (Some people still refer to XSLT as XSL.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Shorthand, Short Form, and Inheritance
The word shorthand in XSL is reserved for the CSS-compatibility shorthands in the complete conformance level (it's described in section 7.29 of the specification, Shorthand Properties). Setting all components of a compound property by omitting the component specification is termed a short form ; it is not a shorthand and is part of the basic conformance level. More on compound properties is discussed in Chapter 4. As an example, the background property (except for a specification of border width, color, and style for all four borders) is derived from and aligned with a similar CSS property.
By whatever name we choose to call it, a shorthand is a time-saving device for specifying more than one property with a single statement. For example, section 7.29.3 in the specification defines the single property, border. The border property is a shorthand property for setting the same width, color, and style for all four borders — top, bottom, left, and right — of a box. Section 7.29 of the specification lists them all. The visual ones are shown here:
background
border-bottom
background-position
border-color
border-left
border-bottom
border-right
border-style
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Pagination
Practical publishing projects start with a number of constraints. Unless you are experimenting, you will already know if your XML source is targeted at a book, an article, a business form, or a newsletter. In other words, the final product will be a concrete instance of what I will informally call document categories or document classes .
Decades and centuries of usage have established publishing conventions for many document classes. Many readers who have experience with at least one desktop publishing system or formatting software application will be familiar with standard document classes. These conventions suggest rules to be followed at all levels of the formatting process. It is at the pagination level, however, where the effect of these rules is most strongly felt. This chapter discusses XSL pagination — how to design pages and how to put them together.
The rules and conventions that apply to a given document class will determine the presence and structure of the three major divisions of any single document: the front matter , the main matter (probably most commonly known as the body), and the back matter . (These terms are generally used in connection with only certain types of documents. Because the concepts have more general utility, they will be extended to all documents.) The front matter obtains its fullest form in books and typically contains most of the following: a title page, a copyright page, a preface, a table of contents, and lists of figures or other illustrations. Dedications and similar material also belong to the front matter of a document.
The main matter of a document consists of the actual content: everything from the introduction to the appendixes. The back matter may contain an index, more acknowledgments, a glossary, a bibliography, a colophon, and so forth. It is worth pointing out at this stage of the discussion (and I will repeat this point often) that these are logical structures. They may be present and identifiable in your source XML, but will not be the same in the FO document. Equally, items like a table of contents will not exist in the source but will be generated when transforming to the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Document Classes
The rules and conventions that apply to a given document class will determine the presence and structure of the three major divisions of any single document: the front matter , the main matter (probably most commonly known as the body), and the back matter . (These terms are generally used in connection with only certain types of documents. Because the concepts have more general utility, they will be extended to all documents.) The front matter obtains its fullest form in books and typically contains most of the following: a title page, a copyright page, a preface, a table of contents, and lists of figures or other illustrations. Dedications and similar material also belong to the front matter of a document.
The main matter of a document consists of the actual content: everything from the introduction to the appendixes. The back matter may contain an index, more acknowledgments, a glossary, a bibliography, a colophon, and so forth. It is worth pointing out at this stage of the discussion (and I will repeat this point often) that these are logical structures. They may be present and identifiable in your source XML, but will not be the same in the FO document. Equally, items like a table of contents will not exist in the source but will be generated when transforming to the fo namespace.
Depending on the specific type of document, the front matter may be greatly abbreviated, may be missing altogether, or may be combined with the main matter. This is typical of articles and reports. Letters and business forms may have only main matter. Books have all three, and these contain nearly all of the listed sections.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Main Parts of an XSL-FO Document
XML instances in the fo namespace, or XSL-FO stylesheets, consist of two major parts. The first part describes the general layout of all possible pages and provides instructions to the formatter regarding which page templates to use. The second part assigns the actual content of the document to the pages and describes the formatting of the content. The general pagination problem consists of properly and fully constructing the first part and in making the proper assignment of content flows. This chapter will cover all of this in detail. The formatting of content remains for subsequent chapters.
The top-level element of an FO document is the fo:root element.
One important attribute on the fo:root element is the source-document , which has been added such that the source document may be accessed from the XSL-FO document. It's a good habit to pass this to the XSLT stylesheet as a parameter for inclusion.
The children of the fo:root element consist of:
  • One layout-master-set
  • An optional declarations
  • One or more page-sequence
Figure 3-1 shows a very useful diagram from the XSL specification that illustrates the pagination formatting objects.
Figure 3-1: Pagination formatting objects
The declarations element, if used, contains one or more color-profile children. declarations are a wrapper for formatting objects whose content is to be used as a resource to the formatting process. This element groups global declarations for the FO file. See Chapter 7 for a discussion on color profiles.
The layout-master-set corresponds to the first major part of an FO file that I mentioned earlier. Its function is to fully specify the pages to be used in the document. The children of this element consist of simple-page-master elements and page-sequence-master elements. You must have at least one simple-page-master defined. It is good practice to organize your simple-page-master
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Simple Page Master
XSL 1.0 specifies just one way of laying out a page: the page description. We use the simple-page-master element for this page description. Any discussion of page masters presupposes the concept of a page. It may seem self-evident at this point that we do have a page, but there is actually more to this concept in XSL than immediately meets the eye.
The CSS and XSL specifications overlap, and this is reflected in shared models at various levels. CSS originally approached pagination from the web point of view — a single unlimited canvas, effectively restricted in the horizontal, but not in the vertical, direction. XSL is heavily weighted towards paged media; this distinction operates primarily at the level of page master selection, not in the description of single pages. However, CSS is actively embracing paper media (in CSS2), and XSL from the start has acknowledged formats other than print — namely, HTML.
This means in XSL, we must deal with the idea of non-paged media and viewports. Non-paged effectively means one page with flexible boundaries, which is obviously not the case with print. Hence, if you are reading this on a web browser, you are effectively viewing it in a non-paged form. Viewports introduce the ideas of clipping and scrolling, again, not things we will encounter in print. Fortunately, these are XSL capabilities that may be ignored by readers interested in print; implementors are not so lucky. I will sufficiently explain viewport concepts so you will be able to read the spec without confusion.
In XSL under normal (meaning print) circumstances, we use the page-width and page-height attributes on the simple-page-master element. In a production context, these attributes are obvious candidates for XSL parameterization. A simple model of the page is illustrated in Figure 3-2. Note that the labeling of the outer regions supposes a left-to-right, top-to-bottom (lr-tb) writing mode.
Figure 3-2: Simple page model
The page-viewport-area content rectangle is the outermost rectangle, and for any media, this represents the physical bounds of the output medium, e.g., the edges of the sheet of paper.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Complex Pagination
We have, so far, developed a reasonably complete understanding of simple-page-masters, but now it is time to examine complex pagination. What mechanism is available to us to specify the sequence of simple-page-masters that will be used to format a given page-sequence and the flows contained within it? For this purpose, XSL 1.0 provides the page-sequence-master element.
This section will look at how the children of a page-sequence-master may be used to vary the selection of page masters.
A page-sequence may select a simple-page-master directly, using the master-name attribute. This simple-page-master then generates every page required by the flows contained in that page-sequence. In other words, the page master is referenced as many times as is needed. This is shown in Figure 3-6.
Figure 3-6: Single simple-page-master
A page-sequence may alternatively select a page-sequence-master, also through use of the master-reference attribute. The master-reference on the page-sequence matches the master-name on the page-sequence-master. This is most often useful when the layout goes beyond the simple, single layout needs, requiring varying simple-page-master usage, as is the case when recto and verso pages differ.
A page-sequence is not constrained to use a page-sequence-master that has not been used already. page-sequence-masters are not stateful, in this sense, and effectively "reset" themselves when called upon to supply page-masters to a new page-sequence.
The page-sequence-master is a container for so-called sub-sequence-specifiers , which, by definition, are children of the page-sequence-master. Each of the sub-sequence-specifiers defines a subsequence of the page-sequence in question; the sum of all subsequences is the sequence of pages that results from completely formatting the flow in that page-sequence.
The following summary provides a rough description of page-sequence-master and its contents:
Element
page-sequence-master
Purpose
Specifies the constraints on, and the order in which, a certain set of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Page Sequences
So far, I have talked about aspects of fo:page-sequence — its children, the page-masters to which it points in one fashion or another — that pertain to how sequences of pages are married with their page-masters. I have also talked at length about the structure of a page. I have left out several properties that deal with page numbering, but I'll introduce them soon. There are also two other properties that I will talk about that introduce elements of internationalization.
The initial-page-number property fixes the page number for the first page of the page-sequence to which it applies. The values of the property and its interpretation are listed as follows:
auto
If this is the first page-sequence, the initial page number becomes 1. If it is not the first page-sequence, the initial page number of the current page-sequence becomes the page number of the last page of the preceding page-sequence, plus 1. That is, it simply continues numbering pages sequentially.
auto-odd
As for auto. If the resulting value is even, add 1.
auto-even
As for auto. If the resulting value is odd, add 1.
[number]
A positive integer, that is, 1 or greater. If a non-positive integer is supplied, this number is rounded to the nearest positive integer.
To force content to be numbered starting at, say, page 51, simply use this property, as in Example 3-7.
Example 3-7. Forced page numbering
<fo:page-sequence 
 master-reference="chapter" 
 initial-page-label="51"
...
If the first page-sequence has no value specified for initial-page-number, the default of auto is used, and hence, the first page is numbered as 1.
The force-page-count property imposes a condition on the number of pages in a page-sequence. This number may be an absolute count or a parity condition. For each condition, if the condition is not satisfied, one page is added to the current page-sequence
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Areas
In Chapter 3, I discussed how pages are organized into sequences, how page masters are selected for processing, and how the page area is divided into regions. In this chapter, we will delve deeper into what happens on a page. We will go into some detail about page layout. After this, you should appreciate why the formatter produces output as it does, and perhaps have some sympathy with implementors.
As you have seen in Chapter 3, formatting objects contain data that should be rendered as a series of marks on the canvas — text, images, lines, etc. The formatter turns objects into series of imaginary rectangles on the page, called areas. One object may produce more than one area: e.g., an fo:block element produces two areas if split by a page break, as shown in Figure 4-1.
Figure 4-1: A block split over a page boundary
FOs have properties that specify constraints on the appearance and placement of areas generated by them. These constraints are used to calculate area traits , which are attributes of areas that uniquely identify their placement, appearance, and contents. Most properties and traits have one-to-one correspondence: e.g., the color property unambiguously defines a trait with the same name. But there are several cases where relations between properties and traits are more complicated; they will be considered later.
Traits are actual attributes of an area as calculated by the formatter, whereas properties are a set of constraints imposed on the traits.
Areas form a tree structure: a larger area can contain smaller subareas. Typically, the area tree closely resembles the source FO tree: an area generated by formatting object A contains subareas generated by descendant elements of A. Important exceptions are out-of-line elements, such as floats and footnotes.
Areas created by formatting objects can be of two principal types:
Inline-areas
These areas correspond to text chunks, inline images, etc. Areas of this type are stacked on a line in the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Informal Definition of an Area
As you have seen in Chapter 3, formatting objects contain data that should be rendered as a series of marks on the canvas — text, images, lines, etc. The formatter turns objects into series of imaginary rectangles on the page, called areas. One object may produce more than one area: e.g., an fo:block element produces two areas if split by a page break, as shown in Figure 4-1.
Figure 4-1: A block split over a page boundary
FOs have properties that specify constraints on the appearance and placement of areas generated by them. These constraints are used to calculate area traits , which are attributes of areas that uniquely identify their placement, appearance, and contents. Most properties and traits have one-to-one correspondence: e.g., the color property unambiguously defines a trait with the same name. But there are several cases where relations between properties and traits are more complicated; they will be considered later.
Traits are actual attributes of an area as calculated by the formatter, whereas properties are a set of constraints imposed on the traits.
Areas form a tree structure: a larger area can contain smaller subareas. Typically, the area tree closely resembles the source FO tree: an area generated by formatting object A contains subareas generated by descendant elements of A. Important exceptions are out-of-line elements, such as floats and footnotes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Area Types
Areas created by formatting objects can be of two principal types:
Inline-areas
These areas correspond to text chunks, inline images, etc. Areas of this type are stacked on a line in the inline-progression-direction (see Chapter 6). Inline-areas are placed inside other inline-areas or inside line areas. The following objects create only inline-areas: fo:character, fo:inline, fo:inline-container, fo:bidi-override, fo:leader, fo:external-graphic, fo:instream-foreign-object, fo:page-number, and fo:page-number-citation.
Block-areas
These areas correspond to text paragraphs, tables, lists, etc. Areas of this type are stacked on a page in the block-progression-direction (see Chapter 5). The following objects create only block-areas: fo:block, fo:block-container, fo:table, fo:table-and-caption, and fo:list-block.
Each area has a set of font traits, derived from font properties of the respective formatting object. These traits uniquely define a nominal font associated with the area. The area need not actually contain glyphs from this font; parameters of the nominal font may be used in calculating area position. Two such traits are text-altitude and text-depth : they specify the inline-progression-dimension of glyph-areas and are used in line-stacking calculations. These are the low-level items that determine the area sizes.
Two more area types are useful for defining the area model:
Glyph-area
These areas can be viewed as an extreme case of an inline-area, corresponding to a single glyph. Every printable character of the text data in the source FO tree generates a glyph area. A glyph-area has two important traits that other areas don't have:
text-altitude
The height of the nominal ascender of the font to which the glyph belongs
text-depth
The depth of the nominal descender of the font
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Components of an Area
An area may have a border around it, with or without a background inside it (which may be an image or a color fill). The following are terms for rectangles that constitute an area:
Content rectangle
This is the innermost part of an area. It represents the space actually available to host area contents, such as children areas, glyphs, and graphics.
Padding rectangle
This rectangle extends up to the inner boundary of the border. It includes the content rectangle plus padding offsets from all the four sides. This rectangle delimits the zones covered by the background of the area.
Border rectangle
This rectangle is delimited by the external edge of the border frame. It includes the padding rectangle, plus border widths of all the four sides. Except for special cases (absolute/relative positioning, overflow, out-of-line elements, etc.), no marks are produced by a formatting object outside the border rectangle of its generated area(s) — the rectangle is surrounded by spaces transparent to marks left by other areas.
All these rectangles should be present in CSS2 box model. There is one more rectangle defined in CSS: a margin rectangle that incorporates margins around the border. In XSL, margins are not used for area positioning (they are replaced by spaces); so it does not make sense to include the respective rectangle in the model. Figure 4-5 shows the content, padding, and border rectangles.
Figure 4-5: Area nomenclature
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reference Areas
In CSS2, normally positioned blocks have properties that determine their placement with respect to the content rectangle of their parent box. All boxes are equivalent: children inside a box are always stacked in the same manner.
XSL takes a different approach: it designates some areas for use as reference for defining inline-progression-dimension and orientation of their descendant areas. Such areas are called reference areas ; I will refer to other areas as normal.
Reference areas have the following distinctive features:
  • They define starting points for start-indent and end-indent traits of all descendant normal block-areas.
  • They can set new writing-mode and reference-orientation (normal areas can change inline-progression-direction only via the direction property or bidi mechanism).
  • Their dimension is always bound in both directions, and the display-align trait can be set to align their contents in the block-progression-dimension.
All region and column areas are reference areas; areas produced by fo:block elements are normal. table-cell is a reference area; label and body of a list item are normal. Only three formatting objects can explicitly generate reference areas:
  • fo:table-cell
  • fo:block-container
  • fo:inline-container
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Area Positioning
The XSL specification defines a large variety of properties to express constraints over the appearance of areas on a page. The formatter tries to choose an optimal location for the area. This is not a straightforward process: properties may clash with each other, giving rise to an overconstrained geometry specification. It's the formatter's task to choose a location for an area that will satisfy as many constraints as possible. The XSL spec is flexible about rule conflicts: it defines rules for prioritizing some constraints over others, but delegates the right to make the final decision to the formatter engine.
Next, I will analyze properties for expressing area position and dimensions, and describe their interaction rules.
It is not uncommon that a single formatting object produces two or more areas. A single block of text may be split by a page break; an inline element may be scattered into several lines. Traits of the resulting areas are controlled by properties of their source formatting object.
Borders and padding can be applied conditionally, using the extended property, border-after-width.conditionality, for instance. The values are either retain or discard, and affect the border or padding when it is at the beginning or the end of a reference area. This can cause problems when you actually want the border or padding at the before or start side, and the conditionality is set to discard (the default). It is explained further when I discuss space resolution in Section 4.5.4.
This is another case where a trailing area (here, a border) may be discarded if it is the last in a reference area. Roughly, this means if a sequence of areas has a border specified, the final one may be discarded, because its area is lost in the parent area. For example, if you have specified border-after on six successive areas, with the final one ending a chapter, this may be discarded (default) by the formatter, because its area will be lost in the break before the start of the next chapter. Similar logic works in the inline-progression-direction
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Blocks
Blocks represent smaller parts of a document, familiar as features such as paragraphs, lists, and tables. Using these pieces, you can structure your documents and present them within the page contexts you've established.
Think of the last document you styled. Each major space-separated block of contiguous text, graphic, table, or list is most likely to be a block when styled with XSL-FO. fo:block could be called the basic building block of page content. Simply inserting content into an fo:block element produces a simple paragraph style with all the default properties. Blocks are most commonly used within the page layout you have specified, specifically within the fo:flow element.
To appreciate the flexibility of blocks, it's necessary first to select the right type of block, then to select from its list of available properties.
The top-level blocks include:
  • fo:block
  • fo:block-container
  • fo:list
  • fo:table
These are the major divisions, each producing an area within the block-progression-direction, visually separated by a new line. I'll cover each of these in turn.
The content model for a block consists of other blocks, inlines, or textual content. The simple block, acting as a paragraph, is likely to be your most used element in the fo namespace, for normal text-heavy documents. Note that the same fo:block may be used for any content that requires whitespace separation in the block-progression-direction. This ranges from the title of a document on a page by itself to list item contents. The block is a versatile element.
The stylesheet snippet in Example 5-1 picks out the para elements in an XML source document, styles them as blocks with no start indent, has fairly typical spacing between its predecessor and successor, uses the Times font, and has simple content. The border around the block is simply to outline its area, as I'll be referring to this again.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Block Basics
Think of the last document you styled. Each major space-separated block of contiguous text, graphic, table, or list is most likely to be a block when styled with XSL-FO. fo:block could be called the basic building block of page content. Simply inserting content into an fo:block element produces a simple paragraph style with all the default properties. Blocks are most commonly used within the page layout you have specified, specifically within the fo:flow element.
To appreciate the flexibility of blocks, it's necessary first to select the right type of block, then to select from its list of available properties.
The top-level blocks include:
  • fo:block
  • fo:block-container
  • fo:list
  • fo:table
These are the major divisions, each producing an area within the block-progression-direction, visually separated by a new line. I'll cover each of these in turn.
The content model for a block consists of other blocks, inlines, or textual content. The simple block, acting as a paragraph, is likely to be your most used element in the fo namespace, for normal text-heavy documents. Note that the same fo:block may be used for any content that requires whitespace separation in the block-progression-direction. This ranges from the title of a document on a page by itself to list item contents. The block is a versatile element.
The stylesheet snippet in Example 5-1 picks out the para elements in an XML source document, styles them as blocks with no start indent, has fairly typical spacing between its predecessor and successor, uses the Times font, and has simple content. The border around the block is simply to outline its area, as I'll be referring to this again.
Example 5-1. A simple block
<xsl:template match="para">
    <fo:block  
      border-style="solid" 
      border-width=".1mm"
      font-family="Times"
      font-size="12pt"
      space-before="12pt"
      space-after="12pt"
      text-align="justify">
          <xsl:apply-templates/>
    </fo:block>
  </xsl:template>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Blocks for Special Purposes
The use of blocks for some purposes may not be readily obvious. A simple heading in a larger font may not be seen as a variant of a block, but it is. It may be used as the main heading of a document.
A title page is simply a single block that has break-before and break-after set to the value page, and the space-before.conditionality set to retain. Example 5-12 shows a specific example of this practice.
Example