Chapter 3The Data EffectA Glut at the End of the Rainbow

We are up to our ears in data, but how much can this raw material really tell us? What actually makes it predictive? What are the most bizarre discoveries from data? When we find an interesting insight, why are we often better off not asking why? In what way is bigger data more dangerous? How do we avoid being fooled by random noise and ensure scientific discoveries are trustworthy?

Spotting the big data tsunami, analytics enthusiasts exclaim, “Surf's up!”

We've entered the golden age of predictive discoveries. A frenzy of number crunching churns out a bonanza of colorful, valuable, and sometimes surprising insights:1

  • People who “like” curly fries on Facebook are more intelligent.
  • Typing with proper capitalization indicates creditworthiness.
  • Users of the Chrome and Firefox browsers make better employees.
  • Men who skip breakfast are at greater risk for coronary heart disease.
  • The demand for Pop-Tarts spikes before a hurricane.
  • Female-named hurricanes are more deadly.
  • High-crime neighborhoods demand more Uber rides.

A Cautionary Tale: Orange Lemons

Look like fun? Before you dive in, be warned: This spree of data exploration must be tamed with strict quality control. It's easy to get it wrong and end up with egg on your face.

In 2012, a Seattle Times article led with an eye-catching predictive discovery: “An orange used ...

Get Predictive Analytics, Revised and Updated now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.