13.6. CHAPTER SUMMARY

  • Data quality is critical because it boosts confidence, enables better customer service, enhances strategic decision making, and reduces risks from disastrous decisions.

  • Data quality dimensions include accuracy, domain integrity, consistency, completeness, structural definiteness, clarity, and many more.

  • Data quality problems run the gamut of dummy values, missing values, cryptic values, contradicting values, business rule violations, inconsistent values, and so on.

  • Data pollution results from many sources in a data warehouse and this variety of pollution sources intensifies the challenges faced when attempting to clean up the data.

  • Poor data quality of names and addresses presents serious concerns to organizations. This area is one of the greatest challenges.

  • Data cleansing tools contain useful error discovery and error correction features. Learn about them and make use of the tools applicable to your environment.

  • The DBMS itself can be used for data cleansing.

  • Set up a sound data quality initiative in your organization. Within the framework, make the data cleansing decisions.

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.