Chapter 4. Location, Location, Location
Addresses differ around the world, and while I have worked in Canada and England, I will stick with what I really know and discuss only United States addresses and their components like ZIP code. However, most of the techniques presented here are probably applicable elsewhere, perhaps with some tuning to account for differences in postal code formats and so on.
What Makes an Address?
Addresses are composed of many parts:
- Street number
-
“123”
- Street name
-
“Main”
- Street type
-
“St” versus “Blvd” versus “Rd” versus “Hwy”
- Box, suite, lot, or apartment number
-
Perhaps “Floor” and other variants.
- City
-
Sometimes called locale in schemas (or even just l in LDAP).
- County
-
Are you “data quality mature”? In your system’s user interface is county a cascading drop-down list based on the state chosen or, better, the ZIP code?
- State, province, or state/province abbreviation
-
Probably the latter.
- ZIP or postal code
-
What about “+4” for the United States? Does your organization consistently enter and check that for data quality?
- Country
-
Is it from a constrained drop-down list? Good. If it is a freeform text field that is hand-entered, then probably Not Good.
- Latitude and longitude
-
Unlikely, or it is getting autopopulated by a background process and still could be wrong (hint: rural addresses, P.O. boxes, etc.). This isn’t useful for address matching, so we will drop it from our discussion.
In the United Kingdom and other Commonwealth countries, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access