November 2017
Beginner to intermediate
366 pages
7h 59m
English
The expectation maximization algorithm for record linkage:
>library(RecordLinkage)data("RLdata500")# Em weight calculationrec.pairs <- compare.dedup(RLdata500 ,blockfld = list(1, 5:7) ,strcmp = c(2,3,4) ,strcmpfun = levenshteinSim)pairs.weights <- emWeights(rec.pairs)hist(pairs.weights$Wdata)summary(pairs.weights)weights.df<-getPairs(pairs.weights)head(weights.df)# Classificationpairs.classify <- emClassify(pairs.weights, threshold.upper = 10, threshold.lower = 5)# View the matchesfinal.results <- pairs.classify$pairsfinal.results$weight <- pairs.classify$Wdatafinal.results$links <- pairs.classify$predictionhead(final.results)counts <- table(final.results$links)barplot(counts, main="Link Distribution", xlab="Link ...Read now
Unlock full access