O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Information Quality

Book Description

Provides an important framework for data analysts in assessing the quality of data and its potential to provide meaningful insights through analysis

Analytics and statistical analysis have become pervasive topics, mainly due to the growing availability of data and analytic tools. Technology, however, fails to deliver insights with added value if the quality of the information it generates is not assured. Information Quality (InfoQ) is a tool developed by the authors to assess the potential of a dataset to achieve a goal of interest, using data analysis.  Whether the information quality of a dataset is sufficient is of practical importance at many stages of the data analytics journey, from the pre-data collection stage to the post-data collection and post-analysis stages. It is also critical to various stakeholders: data collection agencies, analysts, data scientists, and management.

 This book:

  • Explains how to integrate the notions of goal, data, analysis and utility that are the main building blocks of data analysis within any domain.
  • Presents a framework for integrating domain knowledge with data analysis.
  • Provides a combination of both methodological and practical aspects of data analysis.
  • Discusses issues surrounding the implementation and integration of InfoQ in both academic programmes and business / industrial projects.
  • Showcases numerous case studies in a variety of application areas such as education, healthcare, official statistics, risk management and marketing surveys.
  • Presents a review of software tools from the InfoQ perspective along with example datasets on an accompanying website.

 This book will be beneficial for researchers in academia and in industry, analysts, consultants, and agencies that collect and analyse data as well as undergraduate and postgraduate courses involving data analysis.

Table of Contents

  1. Cover
  2. Title Page
  3. Foreword
  4. About the authors
  5. Preface
    1. References
  6. Quotes about the book
  7. About the companion website
  8. Part I: THE INFORMATION QUALITY FRAMEWORK
    1. 1 Introduction to information quality
      1. 1.1 Introduction
      2. 1.2 Components of InfoQ
      3. 1.3 Definition of information quality
      4. 1.4 Examples from online auction studies
      5. 1.5 InfoQ and study quality
      6. 1.6 Summary
      7. References
    2. 2 Quality of goal, data quality, and analysis quality
      1. 2.1 Introduction
      2. 2.2 Data quality
      3. 2.3 Analysis quality
      4. 2.4 Quality of utility
      5. 2.5 Summary
      6. References
    3. 3 Dimensions of information quality and InfoQ assessment
      1. 3.1 Introduction
      2. 3.2 The eight dimensions of InfoQ
      3. 3.3 Assessing InfoQ
      4. 3.4 Example: InfoQ assessment of online auction experimental data
      5. 3.5 Summary
      6. References
    4. 4 InfoQ at the study design stage
      1. 4.1 Introduction
      2. 4.2 Primary versus secondary data and experiments versus observational data
      3. 4.3 Statistical design of experiments
      4. 4.4 Clinical trials and experiments with human subjects
      5. 4.5 Design of observational studies: Survey sampling
      6. 4.6 Computer experiments (simulations)
      7. 4.7 Multiobjective studies
      8. 4.8 Summary
      9. References
    5. 5 InfoQ at the postdata collection stage
      1. 5.1 Introduction
      2. 5.2 Postdata collection data
      3. 5.3 Data cleaning and preprocessing
      4. 5.4 Reweighting and bias adjustment
      5. 5.5 Meta‐analysis
      6. 5.6 Retrospective experimental design analysis
      7. 5.7 Models that account for data “loss”: Censoring and truncation
      8. 5.8 Summary
      9. References
  9. Part II: APPLICATIONS OF InfoQ
    1. 6 Education
      1. 6.1 Introduction
      2. 6.2 Test scores in schools
      3. 6.3 Value‐added models for educational assessment
      4. 6.4 Assessing understanding of concepts
      5. 6.5 Summary
      6. Appendix: MERLO implementation for an introduction to statistics course
      7. References
    2. 7 Customer surveys
      1. 7.1 Introduction
      2. 7.2 Design of customer surveys
      3. 7.3 InfoQ components
      4. 7.4 Models for customer survey data analysis
      5. 7.5 InfoQ evaluation
      6. 7.6 Summary
      7. Appendix: A posteriori InfoQ improvement for survey nonresponse selection bias
      8. References
    3. 8 Healthcare
      1. 8.1 Introduction
      2. 8.2 Institute of medicine reports
      3. 8.3 Sant’Anna di Pisa report on the Tuscany healthcare system
      4. 8.4 The haemodialysis case study
      5. 8.5 The Geriatric Medical Center case study
      6. 8.6 Report of cancer incidence cluster
      7. 8.7 Summary
      8. References
    4. 9 Risk management
      1. 9.1 Introduction
      2. 9.2 Financial engineering, risk management, and Taleb’s quadrant
      3. 9.3 Risk management of OSS
      4. 9.4 Risk management of a telecommunication system supplier
      5. 9.5 Risk management in enterprise system implementation
      6. 9.6 Summary
      7. References
    5. 10 Official statistics
      1. 10.1 Introduction
      2. 10.2 Information quality and official statistics
      3. 10.3 Quality standards for official statistics
      4. 10.4 Standards for customer surveys
      5. 10.5 Integrating official statistics with administrative data for enhanced InfoQ
      6. 10.6 Summary
      7. References
  10. Part III: IMPLEMENTING InfoQ
    1. 11 InfoQ and reproducible research
      1. 11.1 Introduction
      2. 11.2 Definitions of reproducibility, repeatability, and replicability
      3. 11.3 Reproducibility and repeatability in GR&R
      4. 11.4 Reproducibility and repeatability in animal behavior studies
      5. 11.5 Replicability in genome‐wide association studies
      6. 11.6 Reproducibility, repeatability, and replicability: the InfoQ lens
      7. 11.7 Summary
      8. Appendix: Gauge repeatability and reproducibility study design and analysis
      9. References
    2. 12 InfoQ in review processes of scientific publications
      1. 12.1 Introduction
      2. 12.2 Current guidelines in applied journals
      3. 12.3 InfoQ guidelines for reviewers
      4. 12.4 Summary
      5. References
    3. 13 Integrating InfoQ into data science analytics programs, research methods courses, and more
      1. 13.1 Introduction
      2. 13.2 Experience from InfoQ integrations in existing courses
      3. 13.3 InfoQ as an integrating theme in analytics programs
      4. 13.4 Designing a new analytics course (or redesigning an existing course)
      5. 13.5 A one‐day InfoQ workshop
      6. 13.6 Summary
      7. Acknowledgements
      8. References
    4. 14 InfoQ support with R
      1. 14.1 Introduction
      2. 14.2 Examples of information quality with R
      3. 14.3 Components and dimensions of InfoQ and R
      4. 14.4 Summary
      5. References
    5. 15 InfoQ support with Minitab
      1. 15.1 Introduction
      2. 15.2 Components and dimensions of InfoQ and Minitab
      3. 15.3 Examples of InfoQ with Minitab
      4. 15.4 Summary
      5. References
    6. 16 InfoQ support with JMP
      1. 16.1 Introduction
      2. 16.2 Example 1: Controlling a film deposition process
      3. 16.3 Example 2: Predicting water quality in the Savannah River Basin
      4. 16.4 A JMP application to score the InfoQ dimensions
      5. 16.5 JMP capabilities and InfoQ
      6. 16.6 Summary
      7. References
  11. Index
  12. End User License Agreement