CHAPTER 5Explore the Data
If you tell a data scientist to go on a fishing expedition … then you deserve what you get, which is a bad analysis.1
Thomas C. Redman, “the Data Doc” and Harvard Business Review contributor
Data projects are never as simple as they appear in a boardroom presentation. Stakeholders typically see a polished PowerPoint presentation that follows a rigid script from question to data to answer. What's lost in that story, however, are all the ideas that didn't make the cut: the important decisions and assumptions the data team made along the way to arrive at their answer. A good data team does not follow a linear path but a meandering one, adapting to discoveries in the data. As they get further along in their journey, they circle back to earlier ideas, only to see multiple paths open as a result.
This process of iteration, discovery, and data scrutiny is known as exploratory data analysis (EDA). It was formulated by statistician John Tukey in the 1970s as a way to make sense of data with summary statistics and visualizations before applying more complex methods.2 Tukey saw EDA as detective work. Clues are hidden in data, and the right exploration would reveal next steps to follow. Indeed, EDA is another way to “argue” with your data. It's a fundamental part of all data work that both sets and changes the direction of a project based on what's uncovered.
EXPLORATORY DATA ANALYSIS AND YOU
Exploratory data analysis can be an uncomfortable thought for some: ...
Get Becoming a Data Head now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.