Chapter 10. Subtle Sources of Bias and Error

Jonathan A. Schwabish

Please note: The views expressed in this chapter are those of the author and should not be interpreted as those of the Congressional Budget Office.

Before we get started, let me be clear: I get to work with some of the best socioeconomic data in the world. I have access to data provided by the U.S. Social Security Administration (SSA), which provides information on earnings, government benefits (specifically, Social Security benefits), and earnings for a huge number of people over a large number of years. The data is provided to the government through workers’ W-2 tax forms or other government records. By comparison, survey data is often collected from interviews between an interviewer and respondent, but may also be collected online or through computer interfaces in which there is no interaction between the interviewer and interviewee. Administrative data is becoming more widely available in many social science fields, and while that availability is enabling researchers to ask new and interesting questions, that data has also led to new questions about various sources of bias and error in survey data.

Administrative data has both advantages and disadvantages over publicly available survey data. The major advantage is that the administrative data tends to be more accurate than survey data because it is not subject to the typical errors found in survey data. Such errors include:

  • Nonresponse (the respondent fails to ...

Get Bad Data Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.