Chapter 7

Textbases

Abstract

Textbase is the current buzzword for document management systems, which deals with data kept in text as opposed to traditional structured data, relationships, or temporal models. It is the oldest form of data we use. Documents can be free text or semi-structured documents. The problem is that text can be treated as strings that have only syntax; that is patterns of characters that can be mathematically defined and mechanically manipulated by relatively simple algorithms. However, words have semantics; this requires human judgment or insanely complicated algorithms that are able to learn and make humanlike judgments. Most of the important business rules (laws, contracts, rules, definitions, communications, etc.) are ...

Get Joe Celko’s Complete Guide to NoSQL now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.