Chapter 7

Textbases

Abstract

Textbase is the current buzzword for document management systems, which deals with data kept in text as opposed to traditional structured data, relationships, or temporal models. It is the oldest form of data we use. Documents can be free text or semi-structured documents. The problem is that text can be treated as strings that have only syntax; that is patterns of characters that can be mathematically defined and mechanically manipulated by relatively simple algorithms. However, words have semantics; this requires human judgment or insanely complicated algorithms that are able to learn and make humanlike judgments. Most of the important business rules (laws, contracts, rules, definitions, communications, etc.) are ...

Get Joe Celko’s Complete Guide to NoSQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.