Supervised learning

In a supervised learning scenario, we need to provide the algorithm with a set of training tuples. Each tuple has our features from record pairs and a label classifying the tuple as either a match or no match. In our case, we don't have any labeled data.

The RecordLinkage package provides a numeric vector called identity.RLdata500, which stores the matching record number for every record number. We can pass this using an identity parameter to compare.dedup:

> str(identity.RLdata500) num [1:500] 34 51 115 189 72 142 162 48 133 190 ...> str(identity.RLdata500) num [1:500] 34 51 115 189 72 142 162 48 133 190 ...> rec.pairs <- compare.dedup(RLdata500+                            ,identity = identity.RLdata500+                            ,blockfld = list(1, 5:7)+ )> head(rec.pairs$pairs) ...

Get R Data Analysis Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.