13.1. CHAPTER OBJECTIVES

  • Clearly understand why data quality is critical in a data warehouse

  • Observe the challenges posed by corrupt data and learn the methods to deal with them

  • Appreciate the benefits of quality data

  • Review the various categories of data quality tools and examine their usage

  • Study the implications of a data quality initiative and learn practical tips on data quality

Imagine a small error, seemingly inconsequential, creeping into one of your operational systems. While collecting data in that operational system about customers, let us say the user consistently entered erroneous region codes. The sales region codes of the customers are all messed up, but in the operational system, the accuracy of the region codes may not be that important because no invoices to the customers are going to be mailed out using region codes. These region codes were entered for marketing purposes.

Now take the customer data to the next step and move it into the data warehouse. What is the consequence of this error? All analyses performed by your data warehouse users based on region codes will result in serious misrepresentation. An error that seems to be so irrelevant in the operational systems can cause gross distortion in the results from the data warehouse. This example may not appear to be the true state of affairs in many data warehouses, but you will be surprised to learn that these kinds of problems are common. Poor data quality in the source systems results in poor decisions by the ...

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.