book

Building an Anonymization Pipeline

by Luk Arbuckle, Khaled El Emam

April 2020

Intermediate to advanced

164 pages

5h 9m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Why We Wrote This BookWho This Book Was Written ForHow This Book Is OrganizedConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction
IdentifiabilityGetting to TermsLaws and RegulationsStates of DataAnonymization as Data ProtectionApproval or ConsentPurpose SpecificationRe-identification AttacksAnonymization in PracticeFinal Thoughts
2. Identifiability Spectrum
Legal LandscapeDisclosure RiskTypes of DisclosureDimensions of Data PrivacyRe-identification ScienceDefined PopulationDirection of MatchingStructure of DataOverall IdentifiabilityFinal Thoughts
3. A Practical Risk-Management Framework
Five Safes of AnonymizationSafe ProjectsSafe PeopleSafe SettingsSafe DataSafe OutputsFive Safes in PracticeFinal Thoughts
4. Identified Data
Requirements GatheringUse CasesData FlowsData and Data SubjectsFrom Primary to Secondary UseDealing with Direct IdentifiersDealing with Indirect IdentifiersFrom Identified to AnonymizedMixing Identified with AnonymizedApplying Anonymized to IdentifiedFinal Thoughts
5. Pseudonymized Data
Data Protection and Legal AuthorityPseudonymized ServicesLegal AuthorityLegitimate InterestsA First Step to AnonymizationRevisiting Primary to Secondary UseAnalytics PlatformsSynthetic DataBiometric IdentifiersFinal Thoughts
6. Anonymized Data
Identifiability Spectrum RevisitedMaking the ConnectionAnonymized at SourceAdditional Sources of DataPooling Anonymized DataPros/Cons of Collecting at SourceMethods of Collecting at SourceSafe PoolingAccess to the Stored DataFeeding Source AnonymizationFinal Thoughts
7. Safe Use
Foundations of TrustTrust in AlgorithmsTechniques of AIMLTechnical ChallengesAlgorithms Failing on TrustPrinciples of Responsible AIMLGovernance and OversightPrivacy EthicsData MonitoringFinal Thoughts
Index

Overview

How can you use data in a way that protects individual privacy but still provides useful and meaningful analytics? With this practical book, data architects and engineers will learn how to establish and integrate secure, repeatable anonymization processes into their data flows and analytics in a sustainable manner.

Luk Arbuckle and Khaled El Emam from Privacy Analytics explore end-to-end solutions for anonymizing device and IoT data, based on collection models and use cases that address real business needs. These examples come from some of the most demanding data environments, such as healthcare, using approaches that have withstood the test of time.

Create anonymization solutions diverse enough to cover a spectrum of use cases
Match your solutions to the data you use, the people you share it with, and your analysis goals
Build anonymization pipelines around various data collection models to cover different business needs
Generate an anonymized version of original data or use an analytics platform to generate anonymized outputs
Examine the ethical issues around the use of anonymized data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492053422Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills