Implementing Data Quality
Abstract
A key difference of the Data Vault model, as compared to other modeling techniques, is that it allows bad data into the Data Vault and applies business rules after loading the Data Vault. This chapter demonstrates how to deal with bad data in the Data Vault (for example, de-duplicating records with same-as links) and other examples. Another interesting topic is the application of Data Quality Services (DQS) to the Data Vault. DQS is a component of Microsoft SQL Server used for data cleansing. The authors discuss how to define domains in DQS, document them, and apply them to the data in the Data Vault.
Keywords
Get Building a Scalable Data Warehouse with Data Vault 2.0 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.