Docbase records are semistructured. Each has parts that correspond roughly to the header and the body of an email message. The header fields of a record contain values that are typically compact—for example, names or dates. These values often belong to controlled vocabularies—for example, names of companies, products, or authors. Header fields provide the hooks we’ll use in Chapter 7 to build navigational indexes for docbases and in Chapter 8, to organize search results.
The body fields of a docbase contain free-form text. They often exhibit patterns—for example, URLs—that provide hooks for the kinds of instrumentation we saw in Chapter 5. The body fields are subject to full-text search, as are header fields. But unlike header fields, they don’t provide hooks for building navigational indexes.
In this chapter, we’ll focus on a Docbase instance called ProductAnalysis. Its records are reports, written by industry analysts, that assess high-tech products. The creation of a record is a shared responsibility. A manager assigns a report to an analyst, specifying some of the header fields. These manager-specified header fields are as follows.
Date of assignment
Due date for report
Name of company
Name of product (optional, may be supplied by analyst)
There are four analyst-supplied body fields: the report title, a summary of the report (a sentence or paragraph), the full report (many paragraphs), and a chunk of contact information (names, phone numbers, ...