The Text Encoding Initiative (TEI, http://www.tei-c.org/ ) is an XML (originally SGML) application designed for the markup of classic literature, such as Vergil’s Aeneid or the collected works of Thomas Jefferson. It’s a prime example of a narrative-oriented DTD. Since TEI is designed for scholarly analysis of text rather than more casual reading or publishing, it includes elements not only for common document structures (chapter, scene, stanza, etc.) but also for typographical elements, grammatical structure, the position of illustrations on the page, and so forth. These aren’t important to most readers, but they are important to TEI’s intended audience of humanities scholars. For many academic purposes, one manuscript of the Aeneid is not necessarily the same as the next. Transcription errors and emendations made by various monks in the Middle Ages can be crucial.

Example 6-1 shows a fairly simple TEI document that uses the “Lite” version of TEI, a subset of full TEI that includes only the most commonly needed tags. The content comes from the book you’re reading now. Although a complete TEI-encoded copy of this manuscript would be much longer, this simple example demonstrates the basic features of most TEI documents that represent books. (In addition to prose, TEI can also be used for plays, poems, missals, and essentially any written form of literature.)

Example 6-1. A TEI document
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE TEI.2 SYSTEM "xteilite.dtd"> ...

Get XML in a Nutshell, 3rd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.