Chapter 6. Email
Everyone hates email, but now that use of Social Security numbers for data matching is (rightly) becoming scarce, I lean on email addresses fairly heavily. Why? While everyone hates email, everyone tends to have at least one email address, and with few exceptions they tend to be unique (I have an aunt and uncle who share an email address, and that happens, but it is rare). Along with mobile phone numbers, if you have a match on email, your likelihood of a false positive is small.
What Makes a Valid Email Address?
One thing that can throw off matching on email addresses is that Gmail allows optional periods on the recipient side, but their presence or absence makes no difference to Google! Hence, to Gmail, but not to most software:
mortimer.snedley@gmail.com = mortimersnedley@gmail.com
How do you handle this? Remove the periods, of course:
SELECTCOUNT(*)[ExactMatches]FROMcrm.NormalizedCustomerCINNERJOINdbo.PotentialMatchesPONREPLACE(C.,'.','')=REPLACE(P.,'.','');
| Exact Email Matches |
|---|
| 39 |
If you read all the requests for comments (RFCs) on email addresses (and there are a lot of them), you will come to the conclusion that most attempts to validate email addresses on data entry are misguided. Email addresses can contain almost any character, and any format you think they require is probably mistaken. However, valid email addresses do need to contain two specific characters: an @ to separate the delivery address from ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access