Skip to Content
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
book

Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer

by Brian Knight, Devin Knight, Mike Davis, Wayne Snyder
November 2012
Beginner
528 pages
12h 9m
English
Wrox
Content preview from Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer

Chapter 30

Handling Bad Data with the Fuzzy Lookup

More often than not, when you are working in the real world, data is not going to be perfect like it is in the AdventureWorks2012 database. Real-world situations call for cleansing dirty data or data that has abnormalities like misspellings or truncation.

Imagine you are attempting to retrieve a foreign key from a dimension table, but, strangely, you find rows without a match. Upon investigation, you find bad data is being supplied to you. One technique might be to divert these rows without matches to a table to be dealt with later; another might be to just add the bad data regardless of misspellings and other mishaps that occur during data entry.

The Fuzzy Lookup Transform, discussed in this lesson, and the Fuzzy Grouping Transform, discussed in the next lesson, gives other alternatives to dealing with dirty data while reducing your number of unmatched rows. The Fuzzy Lookup Transform matches input records with data that has already been cleansed in a reference table. It returns the match and can also indicate the quality of the match. This way you know the likelihood of the match being correct.

NOTE A best practice tip is to use the Fuzzy Lookup Transform only after trying a regular lookup on the field first. The Fuzzy Lookup Transform is a very expensive operation that builds specialized indexes of the input stream and the reference data for comparison purposes. Therefore, it is recommended to first use a regular Lookup Transform ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Professional Microsoft SQL Server 2012 Integration Services

Professional Microsoft SQL Server 2012 Integration Services

Brian Knight, Erik Veerman, Jessica M. Moss, Mike Davis, Chris Rock
Professional Microsoft SQL Server 2014 Integration Services

Professional Microsoft SQL Server 2014 Integration Services

Brian Knight, Devin Knight, Jessica M. Moss, Mike Davis, Chris Rock
Microsoft SQL Server 2014 Unleashed

Microsoft SQL Server 2014 Unleashed

Ray Rankins, Paul Bertucci, Chris Gallelli, Alex T. Silverstein

Publisher Resources

ISBN: 9781118539149Purchase book