Expectation maximization method

The method, emWeights, is based on the expectation maximization algorithm to derive from the weights, a measure of the closeness of two entities. According to this method, two conditional probabilities, one for match and an other for no match, has to be derived.

P (features | match = 0) and P (features | match = 1) are estimated using the expectation maximization algorithm. The weights are calculated as the ratio of these two probabilities. This approach is called the Fellegi-Sunter model.

> library(RecordLinkage)> data("RLdata500")> rec.pairs <- compare.dedup(RLdata500+                            ,blockfld = list(1, 5:7)+                            ,strcmp =   c(2,3,4)+                            ,strcmpfun = levenshteinSim)> pairs.weights <- emWeights(rec.pairs)> hist(pairs.weights$Wdata) ...

Get R Data Analysis Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.