33Record Linkage for Establishments: Background, Challenges, and an Example
Michael D. Larsen1 and Alan Herning2
1Department of Mathematics and Statistics, Saint Michael's College, Colchester, VT, USA
2Data Strategy, Integration and Services Division, Australian Bureau of Statistics, Canberra, Australia
33.1 Introduction
The goal of record linkage is to identify singular entities across two or more databases. Even when unique and error‐free identification numbers do not exist in all the databases, it can be possible to identify multiple representations of a single entity with low probability of error. Discerning correct linkages relies on the comparison of variables in the separate files that strongly suggest that representations in the files pertain to the same entity.
Record linkage is applied in many contexts. Many examples can be found in Kilss and Alvey (1985), Alvey and Jamerson (1997), Herzog et al. (2007), Christen (2012), and Chun et al. (2021). Possibly the most common application is linking people across databases. See, for example, CDC (2020) and US Census Bureau (2021). Increasingly, surveys are designed and implemented with the idea in mind that survey respondents, and possibly nonrespondents as well, will be linked to another source. The second source could be a census or population register, administrative files collected for other purposes, or another survey. Establishments might seek to link internal records on their members or constituent units to other sources. ...
Get Advances in Business Statistics, Methods and Data Collection now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.