O'Reilly logo

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design by Krish Krishnan, W. H. Inmon

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The homogeneity of a document refers to the way that the document is similar to another document in terms of text formats. Text formats can be homogenous, semi-homogeneous, and non-homogeneous. A resume is homogeneous to another resume. A novel is semi-homogenous to a textbook. A contract is non homogenous to an x-ray.

For example, the content of articles is notoriously non-homogeneous. For example, consider the following publications: the National Enquirer, The New York Times, Playboy magazine, and Architectural Digest. It is likely that articles from these publications have few similarities among them. The language, the terms, the subjects discussed, the style of writing, are usually very dissimilar. When addressing very dissimilar ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required