Chapter 17. Data Management

You might wonder what a chapter on data management is doing in a book about statistics. Here’s the reason: the practice of statistics usually involves analyzing data, and the validity of the statistical results depends in large part on the validity of the data analyzed, so if you will be working with statistics, you need to know something about data management, whether you will be performing the management tasks yourself or delegating them to someone else.

Oddly enough, data management is often ignored in statistics classes, as well as in many offices and labs; some professors and project managers seem to believe that data will magically organize into a usable form without human intervention. However, people who work with data on a daily basis have quite a different view of the matter. Many describe the relationship of data management to statistical analysis by invoking the 80/20 rule, meaning that on average 80% of the time devoted to working with data is spent preparing the data for analysis, and only 20% of the time is spent actually analyzing the data. In my view, data management consists of both a general approach to the problem and the knowledge of how to perform a number of specific tasks. Both can be taught and learned, and although it’s true that some people can pick up this knowledge on an informal basis (through the college of hard knocks, so to speak), there is no good reason to leave such matters up to chance. Instead, it makes more sense ...

Get Statistics in a Nutshell, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.