Chapter 6: Entity Resolution

6.1 Introduction. 99

6.1.1 Exact Matching. 99

6.1.2 Fuzzy Matching. 100

6.1.3 Error Handling. 101

6.2 Indexing. 102

6.2.1 INDEX=. 103

6.3 Matching. 105

6.3.1 COMPGED and COMPLEV.. 105

6.3.2 SOUNDEX. 107

6.3.3 Putting Things Together. 109

6.4 Summary. 116

 

6.1 Introduction

There are a number of robust methods for performing entity resolution efficiently at scale. But despite their diversity, they all fall into two basic families: exact and fuzzy.

Figure 6.1: ERA Flow with Entity Resolution Focus

6.1.1 Exact Matching

Exact matching performs an exact comparison of each entity reference attribute, and makes a “match” ...

Get Unstructured Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.