Attaching Schemas to a Document

A given WordprocessingML document can have one or more schemas “attached” to it. The purpose of schema attachment is to enable two things:

  • On-the-fly schema validation as the user edits the document

  • Schema-driven editing functionality

Schema validation happens automatically as a user edits the document. If a particular element declared in an attached schema is present in the document and does not conform to the type defined in the schema, then Word will flag this as an error. We’ve seen examples of this in our press release example, for certain simple types such as xsd:date.

Schema-driven editing functionality is exposed through the XML Structure task pane (covered below) and the Document Actions task pane (covered in Chapter 5).

The Word UI allows you to manually attach schemas to the currently open document. Figure 4-17 shows the appropriate dialog, which you can access by selecting Tools Templates and Add-Ins XML Schema.

Manually attaching an XML schema to a document

Figure 4-17. Manually attaching an XML schema to a document

The “Available XML schemas” list contains the aliases for all of the schemas in the schema library. In this example, the Press Release checkbox is checked, which means that the press release schema is attached to the current document. Multiple schemas can be attached to the same document, just as elements from multiple namespaces can be used in the same XML document.

The Add Schema... button lets you browse for an XSD schema document file in order to add it to your machine’s schema library. By default, it also attaches the schema to the document—automatically checking the corresponding checkbox that newly appears in the “Available XML schemas” list. The Schema Library button opens the Schema Library dialog, which we looked at earlier.

Demystifying Schema Attachment

If all you ever do is manually attach schemas through the Word UI, the process of “schema attachment” may seem a little mysterious. The first thing to do is to stop thinking of it as a process. Instead, think of it as a property of the underlying WordprocessingML document. Secondly, it’s important to understand that Word treats namespaces and schemas as virtually synonymous. That a “schema is attached” to a document means nothing more than the fact that a non-WordprocessingML namespace declaration is present somewhere inside the WordprocessingML document. A “non-WordprocessingML namespace declaration” is a declaration for any namespace other than the namespaces reserved for Word that were introduced in Chapter 2. So when Word says that a schema is attached to a document, it really means that a namespace is attached.

The fact that a schema is attached to the document is independent of whether a corresponding schema library entry is present on the current user’s machine. It doesn’t even matter if the document contains an element or attribute that uses the namespace.

Example 4-6 shows a simple WordprocessingML document with a schema attached, i.e., with a namespace declaration that is not among one of Word’s reserved namespaces.

Example 4-6. A WordprocessingML document with a “schema attached”

<?xml version="1.0"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
  xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
  xmlns:foo="http://xmlportfolio.com/pressRelease">
   
  <w:body/>
   
</w:wordDocument>

If someone in our imaginary PR department opened this document in Word and selected Tools Templates and Add-Ins . . . XML Schema, they would see something very similar to the dialog box we saw in Figure 4-8 (assuming they already have the Press Release schema in their schema library). Specifically, the Press Release checkbox would be checked. As far as Word is concerned, the mere presence of the namespace declaration (anywhere in the document) means that the schema is attached, regardless even of whether any elements or attributes in the document use the namespace.

What happens if the user doesn’t have a corresponding schema library entry? In that case, the schema is no less attached, because we’ve defined “schema attachment” as the presence of a non-WordprocessingML namespace declaration. However, in this case, the attached schema would be considered “unavailable.” Figure 4-18 shows how the Word UI handles this scenario.

An attached, but unavailable, schema

Figure 4-18. An attached, but unavailable, schema

As you can see, a checkbox is still checked, meaning that “a schema is attached.” The only difference is that, since there is no corresponding schema library entry, this schema is considered to be “Unavailable.” And without a corresponding XSD schema document, schema validation and schema-driven editing are not possible.

Thus, for schema validation to work correctly, two conditions must hold:

  • The schema must be attached (the namespace must be declared in the document)

  • The schema must be available (in the machine’s schema library).

Now let’s relate all of this back to our primary use case—using Word as an XML editor. If you recall the basic processing model, the first thing that happens when Word opens an arbitrary XML document is that an XSLT stylesheet is applied to it, converting it to WordprocessingML. Even though the schema library is consulted to see which XSLT stylesheet to apply (based on the namespace of the document’s root element), no schemas have been attached at this point.

Whether a schema is ultimately attached to the document that the user edits is completely determined by whether the result of the onload XSLT transformation includes any non-WordprocessingML namespace declarations. Of course, if the result document contains any custom XML elements in your schema’s namespace, then the schema will de facto be attached (because you can’t have an element without declaring its namespace). And since schema validation is usually only useful when custom XML elements are already present, schema attachment is usually an automatic thing you don’t have to think about; it just happens. Even so, understanding how it works is helpful for debugging and for explaining where unwanted “unavailable” schemas come from—namely, wayward namespace declarations in the result of the onload transformation. (The onload XSLT stylesheets will therefore often use the exclude-result-prefixes and extension-element-prefixes attributes to prevent unwanted namespace declarations appearing in the WordprocessingML document.)

Get Office 2003 XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.