Document Structure and Formatting
Now that you’ve been inundated with information
about lots of document-level constructs, let’s move
into the actual content of a Word document and how it is represented
in WordprocessingML. All Word documents contain three levels of
hierarchy: one or more
sections
containing zero or more
paragraphs
containing zero or more characters
.
A
run
is a grouping of contiguous characters that
have the same properties.
Tables can occur where
paragraphs can, and list items are just a special kind of paragraph.
You cannot have nested structures in WordprocessingML—sections
within sections, or paragraphs within paragraphs. The one exception
to this rule is that tables may contain tables.
Runs
A
“run” is the basic leaf container
for a document’s content and is represented by the
w:r
element. As we’ve seen, the
w:r
element may contain
w:t
elements, which
contain text. Including the w:t
element, there are
24 valid child elements of the w:r
element,
representing things like text, images, deleted text, hyphens, breaks,
tabs, footnotes, endnotes, footnote and endnote references, page
numbers, field text, etc. We’ll look at just a few
of these.
The w:r
element may occur in five separate element
contexts:
w:p
,
w:fldSimple
, w:hlink
,
w:rt
, and w:rubyBase
. The first
one, the paragraph, is the most common. The
w:fldSimple
element
represents a Word field, the
w:hlink
element
represents a hyperlink in Word, and the
w:rt
(“ruby text”) and
w:rubyBase
elements ...
Get Office 2003 XML now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.