Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
by Brian Knight, Devin Knight, Mike Davis, Wayne Snyder
Chapter 30
Handling Bad Data with the Fuzzy Lookup
More often than not, when you are working in the real world, data is not going to be perfect like it is in the AdventureWorks2012 database. Real-world situations call for cleansing dirty data or data that has abnormalities like misspellings or truncation.
Imagine you are attempting to retrieve a foreign key from a dimension table, but, strangely, you find rows without a match. Upon investigation, you find bad data is being supplied to you. One technique might be to divert these rows without matches to a table to be dealt with later; another might be to just add the bad data regardless of misspellings and other mishaps that occur during data entry.
The Fuzzy Lookup Transform, discussed in this lesson, and the Fuzzy Grouping Transform, discussed in the next lesson, gives other alternatives to dealing with dirty data while reducing your number of unmatched rows. The Fuzzy Lookup Transform matches input records with data that has already been cleansed in a reference table. It returns the match and can also indicate the quality of the match. This way you know the likelihood of the match being correct.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access