Document Structure and Formatting

Now that you’ve been inundated with information about lots of document-level constructs, let’s move into the actual content of a Word document and how it is represented in WordprocessingML. All Word documents contain three levels of hierarchy: one or more sections containing zero or more paragraphs containing zero or more characters. A run is a grouping of contiguous characters that have the same properties. Tables can occur where paragraphs can, and list items are just a special kind of paragraph. You cannot have nested structures in WordprocessingML—sections within sections, or paragraphs within paragraphs. The one exception to this rule is that tables may contain tables.

Runs

A “run” is the basic leaf container for a document’s content and is represented by the w:r element. As we’ve seen, the w:r element may contain w:t elements, which contain text. Including the w:t element, there are 24 valid child elements of the w:r element, representing things like text, images, deleted text, hyphens, breaks, tabs, footnotes, endnotes, footnote and endnote references, page numbers, field text, etc. We’ll look at just a few of these.

The w:r element may occur in five separate element contexts: w:p, w:fldSimple, w:hlink, w:rt, and w:rubyBase. The first one, the paragraph, is the most common. The w:fldSimple element represents a Word field, the w:hlink element represents a hyperlink in Word, and the w:rt (“ruby text”) and w:rubyBase elements ...

Get Office 2003 XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.