Chapter 39

Using the DQS Cleansing Transform

In the previous lesson, you created a DQS knowledge base and cleansed some data, all interactively. As you improve the quality, domain coverage, and capability of your knowledge base, it will be able to correct a larger and larger percentage of values from new incoming data. As this occurs, you will benefit from automating as much of the cleansing as possible. You may want to have values that are correct or corrected with high confidence levels to move directly into the destination. Then you can review and fix only the remaining values. This capability exists with SSIS as the DQS Cleansing Transform. Additionally, you can correct and approve or reject the remaining values using the DQS Client. There is a truly intelligent cooperation between the Cleansing Task and the DQS Client.

The Cleansing Task accepts a Data Flow as input, cleanses the data using the knowledge base of your choice, adds output meta data, and passes the Data Flow forward. A commonly used Data Flow for the Cleansing Transform is shown in Figure 39-1.

NOTE The DQS Client is interactive, multithreaded, and is written to run as fast as possible, because most of us are impatient. The fast run time corresponds to high memory use for the client. Because we do not sit and wait for the Cleansing Transform to run in SSIS, it was written to reduce the memory ...

Get Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.