Skip to Content
Fuzzy Data Matching with SQL
book

Fuzzy Data Matching with SQL

by Jim Lehmer
October 2023
Intermediate to advanced
282 pages
6h 32m
English
O'Reilly Media, Inc.
Content preview from Fuzzy Data Matching with SQL

Chapter 3. Names, Names, Names

What can I say? Names are hard. Is it James or Jim? Spell-checking is impossible: people name their kids anything. Add in cross-cultural differences, and it becomes very hard to do much with names; but we must try! The rest of this chapter is going to assume you’re dealing with some system where your customer records are stored with fields similar to these if dealing with a human.

What’s in a Name?

You’ve had to fill out forms with your name since kindergarten. You know the drill on how they are supposed to work. The first three are the most common:

Last name

Or family name or surname. Maybe you have fields for matronymics and patronymics, too.

First name

Or given name. May be optional.

Middle name

Or middle initial or middle names. Optional.

Nickname(s)

Optional.

Suffix

Optional.

Titles and honorifics

Optional, and you’ll learn why we’ll ignore them.

Full name

Often synthesized from the others, but woe to you if your incoming data only has this; we will talk about it at the end!

Or if dealing with businesses, simply this:

Company name

While we commonly tear apart human names into their constituent parts, rarely are entity names held in more than one field. No matter the structure of the entity—corporation, partnership, trust, whatever—we jam it into one field. Except sometimes there is another field that looks like company name.

DBA

“Doing Business As,” often used by an individual who hasn’t established a more formal entity ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

SQL for Data Analysis

SQL for Data Analysis

Cathy Tanimura

Publisher Resources

ISBN: 9781098152260Errata Page