Chapter 2. How We Got Here

Let’s begin by looking back and gaining a little understanding of the data processing landscape. The goal here will be to get to know some of the expectations, players, and tools in the industry.

I’ll first run through a brief history of the tools used throughout the past 20 years of data processing. Then, we look at producer and consumer use cases, followed by a discussion of the issue of scale.

Excel Spreadsheets

Yes, we’re talking about Excel spreadsheets—the software that ran on 386 Intel computers, which had nearly zero computing power compared to even our cell phones of today.

So why are Excel spreadsheets so important? Because of expectations. Spreadsheets were and still are the first introduction into data organization, visualization, and processing for a lot of people. These first impressions leave lasting expectations on what working with data is like. Let’s dig into some of these aspects:


We take it for granted, but spreadsheets allowed us to see the data and its format and get a sense of its scale.


Group By, Sum, and Avg functions were easy to add and returned in real time.


Getting data into graphs and charts was not only easy but provided quick iteration between changes to the query or the displays.

Decision making

Advanced Excel users could make functions that would flag cells of different colors based on different rule conditions.

In short, everything we have today and everything discussed here ...

Get Rebuilding Reliable Data Pipelines Through Modern Tools now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.