Chapter 3. Cross-Sectional Data: Research Registries

One of the first real-life de-identification challenges we faced was releasing data from a maternal-child registry. It’s common to create registries specifically to hold data that will be disclosed for secondary purposes. The Better Outcomes Registry & Network (BORN) of Ontario[43] integrates the data for all hospital and home births in the province, about 140,000 births per year. It was created for improving the provision of health care, and also for research. But without the ability to de-identify data sets, most research using this data would grind to a halt.

The data set from the BORN registry is cross-sectional, in that we cannot trace mothers over time. If a mother has a baby in 2009 and another in 2011, it’s simply not possible to know that it was the same women. This kind of data is quite common in registries and surveys. We use the BORN data set a number of times throughout the book to illustrate various methods because it’s a good baseline data set to work with and one that we know well.

Process Overview

We’ll come back to BORN later. First we’ll discuss the general process of getting de-identified data from a research registry.

Secondary Uses and Disclosures

Once the data is in the registry, outside researchers can make requests for access to data sets containing individual-level records. There’s a process to approve these data requests, and it illustrates an important part of the de-identification process: health research ...

Get Anonymizing Health Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.