Chapter 5

Data Quality

Abstract

This chapter discusses data quality, which is a preliminary consideration for any commercial data analysis project; the definition of quality includes the availability or accessibility of data. The chapter examines typical problems that can occur with data, including errors in the data content (textual and numerical data) and the relevance and reliability of the data, as well as how to quantitatively evaluate data quality. Finally, some typical errors due to data extraction and how to avoid them are discussed by examining a practical case study.

Keywords

data quality

data extraction

availability

accessibility

relevance

reliability

Introduction

Data quality is a primary consideration for any commercial data analysis project, ...

Get Commercial Data Mining now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.