Record Linkage - Stochastic and Machine Learning Approaches

In a large database of records, synonymous records pose a great problem. Two records referring to the same entity are considered to be synonymous. In the absence of a common identifier, such as a primary key or foreign key, joining such records based on the entities is a tough task. Let's illustrate this with a quick example. Consider the following two records:

Sno First name Middle name Last name Address City State Zip
1 John NULL NULL 312 Delray Ave Deer Field FL 33433
2 John NULL Sanders 312 Delray Beach Ave Deer Field FL 33433

 

Both the records refer to the same entity, one Mr. John. Record linkage refers to an umbrella of algorithms that are designed to solve the ...

Get R Data Analysis Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.