Chapter 2. How We Got Here

Let’s begin by looking back and gaining a little understanding of the data processing landscape. The goal here will be to get to know some of the expectations, players, and tools in the industry.

I’ll first run through a brief history of the tools used throughout the past 20 years of data processing. Then, we look at producer and consumer use cases, followed by a discussion of the issue of scale.

Excel Spreadsheets

Yes, we’re talking about Excel spreadsheets—the software that ran on 386 Intel computers, which had nearly zero computing power compared to even our cell phones of today.

So why are Excel spreadsheets so important? Because of expectations. Spreadsheets were and still are the first introduction into data organization, visualization, and processing for a lot of people. These first impressions leave lasting expectations on what working with data is like. Let’s dig into some of these aspects:

Visualization

We take it for granted, but spreadsheets allowed us to see the data and its format and get a sense of its scale.

Functions

Group By, Sum, and Avg functions were easy to add and returned in real time.

Graphics

Getting data into graphs and charts was not only easy but provided quick iteration between changes to the query or the displays.

Decision making

Advanced Excel users could make functions that would flag cells of different colors based on different rule conditions.

In short, everything we have today and everything discussed here ...

Get Rebuilding Reliable Data Pipelines Through Modern Tools now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.