Searching for Outliers

Rather than looking for how units of data belong in certain groups, you should also be interested in how they don’t belong in groups. That is, there will often be data points that stand out from the rest, which are called, you guessed it, outliers. These are data points that are different from the rest of the population. Sometimes they could be the most interesting part of your story, or they could just be boring typos with a missing zero. Either way, you need to check them out to see what’s going on. You don’t want to make a giant graphic on the premise of an outlier, only to find out later from a diligent reader that your hard work makes no sense.

Graphic types have been designed specifically to highlight outliers, but in my experience, nothing beats basic plots and common sense. Learn about the context of your data, do your homework, and ask experts about the data when you’re not sure about something. Once you find the outliers, you can use the same graphical techniques that we’ve used so far to highlight them for readers: Use varied colors, provide pointers, or use thicker borders.

Now look at a simple example. Figure 7-32 shows a time series plot that shows weather data scraped from Weather Underground (like you did in Chapter 2, “Handling Data”), from 1980 to 2005. There are seasonal cycles like you’d expect, but what’s going on in the middle? It seems to be unusually smooth, whereas the rest of the data has some noise. This is nothing to go crazy over, ...

Get Visualize This: The FlowingData Guide to Design, Visualization, and Statistics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.