Chapter 11

Advanced Data Cleansing in SSIS

WHAT’S IN THIS CHAPTER?

  • Using the Derived Column Transformation for advanced data cleansing
  • Applying the Fuzzy Lookup and Fuzzy Grouping Transformations and understanding how they work
  • Introducing the SQL Server 2012 technology Data Quality Services

In this chapter, you will learn the ins and outs of data cleansing in SSIS, from the basics to the advanced. In a broad sense, one of SSIS’s main purposes is to cleanse data — that is, transform data from a source to a destination and perform operations on it along the way. In that sense, someone could correctly say that every transformation in SSIS is about data cleansing. For example, consider the following transformations:

  • The Data Conversion adjusts data types.
  • The Sort removes duplicate data.
  • The Merge Join correlates data from two sources.
  • The Derived Column applies expression logic to data.
  • The Data Mining predicts values and exceptions.
  • The Script applies .NET logic to data.
  • The Term Extraction and Term Lookup perform text mining.

In a stricter sense, data cleansing is about identifying incomplete, incorrect, or irrelevant data and then updating, modifying, or removing the “dirty” data. From this perspective, SSIS has four primary data cleansing transformations, which are reviewed in this chapter:

  • Derived Column Transformation: This transformation can perform advanced expression-based data cleansing. If you have just basic data cleansing needs, like blanks or nulls or simple text ...

Get Professional Microsoft SQL Server 2012 Integration Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.