Chapter 6. Company Matching

In Chapter 5, we examined the challenge of resolving a larger set of individual entities, matching on name and date of birth. In this chapter, we consider another typical scenario, resolving organization entities so that we can get a more complete picture of their business.

We could perhaps use the date of incorporation of the organization as a discriminator, similar to the way we used date of birth to help identify unique individuals. However, this incorporation date information is not typically included in organization datasets; it is much more common for a company to be identified by its registered address.

Therefore, in this chapter, we will use company address information, along with company names, to identify likely matches. We will then consider how to evaluate a new record for matches against the original data sources without having to undertake a time-consuming retrain of the model.

Sample Problem

In this chapter, we will resolve a list of company names that is published by the UK Maritime and Coastguard Agency (MCA) against basic organization details published in the Companies House register. This problem illustrates some of the challenges of identifying unique references to the same company, simply based on name and address data.

UK Companies House provides a free downloadable data snapshot containing basic company data of live companies on the register. This data complements the “person with significant control” data we used in Chapter 5 ...

Get Hands-On Entity Resolution now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.