CHAPTER 8 The Evolution of Data Lakes as a Fundamental Part of a New Ecosystem Architecture

Pervasive intelligence requires an intelligent, reliable information architecture. The foundation of an intelligent information architecture is the enterprise data warehouse (DW)—or at a minimum, function-oriented DWs that rely on the same governance structure and data definitions. Traditional DW implementations follow a set approach: define the requirements, identify (structured) data sources, define the schema, load and format the data, then distribute the data through presentation layers.

You can use historical transaction or event data in your DW to figure out what happened and why it might have happened. Based on available data, you can form hypotheses and test them against the data in the warehouse to either confirm or refute your suspicions. It can democratize BI (business intelligence) and put the power of analysis in the hands of less-than-tech-savvy users, presenting them with a single version of the “truth,” no matter who’s asking.


Data warehouses work well as long as the data fits what you’ve defined, but no one needs to tell you that the nature of data has changed significantly over the past decade. The volume of unstructured and streaming data has far surpassed that of traditional, structured data. That flood has given rise to a new repository: the data lake.

Data lakes store unstructured data and format it when it’s ...

Get Pervasive Intelligence Now now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.