Skip to Content
Fuzzy Data Matching with SQL
book

Fuzzy Data Matching with SQL

by Jim Lehmer
October 2023
Intermediate to advanced
282 pages
6h 32m
English
O'Reilly Media, Inc.
Content preview from Fuzzy Data Matching with SQL

Chapter 7. Phone Numbers

With the death of using tax IDs for most data matching (we will talk a bit about them at the end of this chapter), phone numbers, especially mobile phone numbers, are about as close to being a publicly available unique identifier as we’ll ever be able to access. Yes, like email addresses, some people share phone numbers, but the percentage is small and, with the ongoing expansion of mobile phone use, dwindling.

We will look at various issues with phone numbers, including formatting (of course), lack of information (area code, country code), too much information (notes tacked on the end of the number), and the fact that there sure are a lot of them. Does anyone still have a pager? Should you check against it? Let’s talk about all of that!

What Makes a “Phone Number”?

By this point you should recognize the drill. If you think about a field or fields you’re trying to match on for more than a few seconds, you can immediately start to think of things that will get in the way of that. Phone numbers are no exception. Consider the following:

  • (800) 555-1234

  • +1 (800) 555-1234

  • 800-555-1234

  • 1-800-555-1234

  • 8005551234

  • 18005551234

  • (800) 555-1234 Ext. 67 (How does your data handle extensions? How does the incoming data represent it?)

  • 8005551234,,67 (Many modern mobile phones and phone systems still understand and can dial old-style modem “AT commands,” in this case to pause a few seconds after connecting to (800) 555-1234 and then dial 67.)

  • 555-555-1234 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

SQL for Data Analysis

SQL for Data Analysis

Cathy Tanimura

Publisher Resources

ISBN: 9781098152260Errata Page