Anatomy of a Docbase Record

Docbase records are semistructured. Each has parts that correspond roughly to the header and the body of an email message. The header fields of a record contain values that are typically compact—for example, names or dates. These values often belong to controlled vocabularies—for example, names of companies, products, or authors. Header fields provide the hooks we’ll use in Chapter 7 to build navigational indexes for docbases and in Chapter 8, to organize search results.

The body fields of a docbase contain free-form text. They often exhibit patterns—for example, URLs—that provide hooks for the kinds of instrumentation we saw in Chapter 5. The body fields are subject to full-text search, as are header fields. But unlike header fields, they don’t provide hooks for building navigational indexes.

In this chapter, we’ll focus on a Docbase instance called ProductAnalysis. Its records are reports, written by industry analysts, that assess high-tech products. The creation of a record is a shared responsibility. A manager assigns a report to an analyst, specifying some of the header fields. These manager-specified header fields are as follows.

  • Analyst’s name

  • Date of assignment

  • Due date for report

  • Name of company

  • Name of product (optional, may be supplied by analyst)

There are four analyst-supplied body fields: the report title, a summary of the report (a sentence or paragraph), the full report (many paragraphs), and a chunk of contact information (names, phone numbers, ...

Get Practical Internet Groupware now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.