Chapter 1. The Modern Data Landscape and Its Impact on Data Engineering
Major changes in data, data management systems, and data consumption patterns are at the heart of why engineering modern data pipelines is so challenging. In this chapter, we will cover these changes and the implications for data engineers.
Modern Data Sources
Data sources used to be limited to systems supporting business functions such as sales, marketing, manufacturing, and finance. These systems primarily recorded transactions. The data was usually sent to a central database or data warehouse each night so that reports could be run for use the next day. Besides “canned” reporting, there was also the ability to run ad hoc queries built on top of the data warehouse or data mart.
Digital transformation efforts over the past decade mean that now we can access extremely large amounts of interaction data on top of this transaction data. Interactions are events that influence whether a transaction occurs and the characteristics of the transaction. Interactions include clickstreams that record website, social media, advertising, or app interactions; logged events on digital systems such as smartphones, servers, and network equipment; and measurements from the physical world collected by Internet of Things (IoT) sensors embedded in factories and infrastructure, in buildings, or in consumer products.
A transaction is a business fact. Interaction data provides the context. In terms of size and complexity, interaction ...
Get Unlock Complex and Streaming Data with Declarative Data Pipelines now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.