Chapter 5Visualizations and Simple Metrics

A rule of thumb for data science deliverables is this: if there isn't a picture, then you're doing it wrong. Typically, a good analytics project starts (after cleaning and understanding the data) with exploratory visualizations that help you develop hypotheses and get a feel for the data, and it ends with carefully manicured figures that make the final results visually obvious. The actual number crunching is hidden in the middle, sometimes almost as an aside. I've had a number of projects where there was never even any actual machine learning: people needed to know whether there was signal in the data and which directions were most promising for further work (which would potentially include machine learning), and graphics showed that more clearly than a number ever could.

This fact is very underappreciated outside of the data analysis community. Many people think of data scientists as numerical badasses, working black magic from a command line. But that's just not the way the human brain processes data, generates hypotheses, or develops familiarity with an area. Pictures are plans A–C for everything except the last stages of statistically validating results. I've often joked that if humans were able to visualize things in a thousand dimensions, then my job as a data scientist would consist entirely of generating and looking at scatterplots.

This chapter will take you through several of the most important visualizations. You've probably ...

Get The Data Science Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.