Chapter 2. Moving Toward Scalable Data Unification

The early users of data management systems performed business data processing—mostly transactions (updates) and queries on the underlying datasets. These early applications enabled analytics on the current state of the enterprise. About two decades ago enterprises began keeping historical transactional data in what came to be called data warehouses. Such systems enabled the use of analytics to find trends over time; for example, pet rocks are out and Barbie dolls are in. Every large enterprise now has a data warehouse, on which business analysts run queries to find useful information.

The concept has been so successful that enterprises typically now have several-to-many analytical data stores. To perform cross-selling, obtain a single view of a customer, or find the best pricing from many supplier data stores, it is necessary to perform data unification across a collection of independently constructed data stores.

This chapter discusses the history of data unification and current issues.

A Brief History of Data Unification Systems

ETL systems were used early on to integrate data stores. Given the required amount of effort by a skilled programmer, ETL systems typically unified only a handful of data stores, fewer than two dozen in most cases. The bottleneck in these systems was the human time required to transform the data into a common format for the destination repository—it was necessary to write “merge ...

Get Getting DataOps Right now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.