Word’s Processing Model for Editing XML

When Word opens an arbitrary XML document (i.e., an XML document that is not WordprocessingML), that XML document undergoes four primary processes from the time that it is opened to the time that it is saved, in this order:

  1. When the document is first opened, an onload XSLT stylesheet (variously called an “XML data view” or “solution” in the Word UI) is applied, transforming the raw XML into a WordprocessingML document, usually intermixed, or merged, with custom XML tags from the original document.

  2. A user edits the document, modifying the underlying merged representation.

  3. Upon saving, all WordprocessingML elements and attributes are optionally stripped out, leaving only custom XML markup. This option is called “Save data only.”

  4. Finally, an onsave XSLT stylesheet is optionally applied to the result of step 3. This option is called “Apply transform.”

This basic flow is illustrated in the data flow diagram in Figure 4-7.

Word’s basic processing model for editing custom XML

Figure 4-7. Word’s basic processing model for editing custom XML

Each arrow in Figure 4-7 represents an XML document in different states of transformation. Each process operates on the result of the previous process. The last two processes, “Save data only” and “Apply custom transform,” are both optional. When an option is not elected, you can think of the process as being an identity transform, or a no-op. For example, if “Save data only” is turned off, but “Apply transform” is turned on, then the latter effectively operates on the result of process # 2, “User edits document.”

In the next several sections of this chapter, we will detail each of these processes, including how the onload XSLT stylesheet is selected, what the merged representation looks like, what editing functionality is available to the user, how the “Save data only” option works and how to set it, and how an onsave XSLT transformation is selected. But first let’s take a look at the Schema Library, an important ingredient not explicitly evident in this diagram—important because it is consulted both to determine what onload XSLT transformation to apply, and to enable on-the-fly schema validation while editing the document.

Get Office 2003 XML now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.