Chapter 11
Advanced Data Cleansing in SSIS
WHAT’S IN THIS CHAPTER?
- Using the Derived Column Transformation for advanced data cleansing
- Applying the Fuzzy Lookup and Fuzzy Grouping Transformations and understanding how they work
- Introducing the SQL Server 2012 technology Data Quality Services
In this chapter, you will learn the ins and outs of data cleansing in SSIS, from the basics to the advanced. In a broad sense, one of SSIS’s main purposes is to cleanse data — that is, transform data from a source to a destination and perform operations on it along the way. In that sense, someone could correctly say that every transformation in SSIS is about data cleansing. For example, consider the following transformations:
- The Data Conversion adjusts data types.
- The Sort removes duplicate data.
- The Merge Join correlates data from two sources.
- The Derived Column applies expression logic to data.
- The Data Mining predicts values and exceptions.
- The Script applies .NET logic to data.
- The Term Extraction and Term Lookup perform text mining.
In a stricter sense, data cleansing is about identifying incomplete, incorrect, or irrelevant data and then updating, modifying, or removing the “dirty” data. From this perspective, SSIS has four primary data cleansing transformations, which are reviewed in this chapter:
- Derived Column Transformation: This transformation can perform advanced expression-based data cleansing. If you have just basic data cleansing needs, like blanks or nulls or simple text ...