The method, emWeights, is based on the expectation maximization algorithm to derive from the weights, a measure of the closeness of two entities. According to this method, two conditional probabilities, one for match and an other for no match, has to be derived.
P (features | match = 0) and P (features | match = 1) are estimated using the expectation maximization algorithm. The weights are calculated as the ratio of these two probabilities. This approach is called the Fellegi-Sunter model.
> library(RecordLinkage)> data("RLdata500")> rec.pairs <- compare.dedup(RLdata500+ ,blockfld = list(1, 5:7)+ ,strcmp = c(2,3,4)+ ,strcmpfun = levenshteinSim)> pairs.weights <- emWeights(rec.pairs)> hist(pairs.weights$Wdata) ...