November 2017
Beginner to intermediate
366 pages
7h 59m
English
The weights-based algorithm for record linkage:
library(RecordLinkage) data("RLdata500") # weight calculation rec.pairs <- compare.dedup(RLdata500 ,blockfld = list(1, 5:7) ,strcmp = c(2,3,4) ,strcmpfun = levenshteinSim) pairs.weights <- epiWeights(rec.pairs) hist(pairs.weights$Wdata) summary(pairs.weights) weights.df<-getPairs(pairs.weights) head(weights.df) # Classification pairs.classify <- emClassify(pairs.weights, threshold.upper = 0.5, threshold.lower = 0.3) # View the matches final.results <- pairs.classify$pairs final.results$weight <- pairs.classify$Wdata final.results$links <- pairs.classify$prediction head(final.results) counts <- table(final.results$links) barplot(counts, main="Link Distribution", Read now
Unlock full access