Record Linkage - Stochastic and Machine Learning Approaches

In a large database of records, synonymous records pose a great problem. Two records referring to the same entity are considered to be synonymous. In the absence of a common identifier, such as a primary key or foreign key, joining such records based on the entities is a tough task. Let's illustrate this with a quick example. Consider the following two records:

Sno First name Middle name Last name Address City State Zip
1 John NULL NULL 312 Delray Ave Deer Field FL 33433
2 John NULL Sanders 312 Delray Beach Ave Deer Field FL 33433

 

Both the records refer to the same entity, one Mr. John. Record linkage refers to an umbrella of algorithms that are designed to solve the ...

Get R Data Analysis Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.