Chapter 2. Identifiability Spectrum

When identifiability is viewed as a spectrum, with one end signifying identified data and the other end signifying anonymized data, we find ourselves with a range of options for sharing and using data responsibly. It’s therefore an opportunity that we can explore to develop a range of options depending on the use cases and data flows.

Before we dig into the details of the identifiability spectrum in this chapter, we will explore some of the questions and concerns around data sharing. From legal interpretation to practical considerations, we need to understand aspects of privacy to find creative ways to address them. And, if our objective is to understand anonymization, we need to understand how identifiability is considered from a statistical point of view. We will start by understanding how identifiability in data is estimated, before we explore risk assessments based on the context of the data sharing itself in Chapter 3. Let’s start by talking about the legal landscape that informs any discussion of identifiability.

Legal Landscape

This isn’t a book about privacy or data protection laws and regulations, but it’s hard not to at least consider the legislative landscape when discussing anonymization. Legal interpretations change, as do the policies around them, so we’re not going to spend time on this subject except to highlight a few important points as they relate to the material in this book. The concept of identifiability is important because ...

Get Building an Anonymization Pipeline now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.