Chapter 4
What You Should Know About Data
For statisticians (and economists too), the term data mining once had a pejorative meaning. Instead of “to find useful patterns in large volumes of data,” data mining had the connotation of searching for data to fit preconceived ideas. This definition is much like what politicians do around election time — search for data to show the success of their deeds. It is certainly not what the authors mean by data mining!
This chapter is intended to bridge some of the gaps between traditional statistics and data mining. The two disciplines are very similar. Statisticians and data miners commonly use many of the same techniques, and statistical software vendors now include many of the techniques described throughout this book in their software packages. Data miners should have a foundation of knowledge in statistics.
A fact that will perhaps surprise some readers is that statistics developed as a discipline distinct from mathematics over the past century and a half to help scientists make sense of observations and design experiments that yield the reproducible and accurate results associated with the scientific method. Because statistics is intimately tied to scientific understanding of the world, applying it to the scientific understanding of business is natural.
For almost all of this period, the issue was not too much data, but too little. Scientists had to figure out how to understand the world using data collected by hand in notebooks. These ...