Chapter 4.3

Parallel Processing

Abstract

There are different definitions of big data. The definition used here is that big data encompasses a lot of data, is based on inexpensive storage, manages data by the “Roman census” method, and stores data in an unstructured format. There are two major types of big data—repetitive big data and nonrepetitive big data. Only a small fraction of repetitive big data has business value, whereas almost all of nonrepetitive big data has business value. In order to achieve business value, the context of data in big data must be determined. Contextualization of repetitive big data is easily achieved. But contextualization of nonrepetitive data is done by means of textual disambiguation.

Keywords

Big data; Roman ...

Get Data Architecture: A Primer for the Data Scientist, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.