5InfoQ at the postdata collection stage

5.1 Introduction

In Chapter 4, we examined factors affecting the predata collection study design stage, which yield low InfoQ and dataset X that is related to the target dataset X*. That chapter presented a range of methods to increase the InfoQ at the predata collection stage.

In this chapter, we turn to the later stage of an empirical study, after the data has been collected. The data may have been collected by the researcher for the purpose of the study (primary data) or otherwise (secondary and semisecondary data). The data may be observational or experimental. Moreover, the study may have revised goals or even revised utility. These changes affect the way the data is analyzed in order to derive high InfoQ of the study.

We begin by laying out key points about primary, secondary, and semisecondary data, as well as revised goals and revised utility. We then move to a discussion of existing methods and approaches designed to increase information quality at the postdata collection stage. The methods range from “fixing” the data to combining data from multiple studies to imputing missing data. In some cases we can directly model the distortion between X and X*. For the different methods discussed here, we examine the relationship between the target dataset X* and the actual dataset X as a function of both a priori causes, η1, and a posteriori causes, η2, through the relationship X = η2{η1(X*)}. Each approach is designed to increase InfoQ ...

Get Information Quality now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.