book

Hands-On Differential Privacy

by Ethan Cowan, Michael Shoemate, Mayana Pereira

May 2024

Intermediate to advanced

362 pages

8h 47m

English

O'Reilly Media, Inc.

Audio summary available

Read now

Unlock full access

Preface
The Structure of This BookPart 1: Differential Privacy ConceptsPart 2: Differential Privacy in PracticePart 3: Deploying Differential PrivacyConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Differential Privacy Concepts
1. Welcome to Differential Privacy
HistoryPrivatization Before Differential PrivacyCase Study: Applying DP in a ClassroomPrivacy and the MeanHow Could This Be Prevented?Adjacent Data Sets: What If Someone Else Had Dropped the Class?Sensitivity: How Much Can the Statistic Change?Adding NoiseWhat Is a Trusted Curator?Available ToolsSummaryExercises
2. Differential Privacy Fundamentals
Intuitive PrivacyPrivacy UnitPrivacy LossFormalizing the Concept of Differential PrivacyRandomized ResponsePrivacy ViolationModels of Differential PrivacySensitivityDifferentially Private MechanismsLaplace MechanismThe Laplace Mechanism Is ϵ-DPMechanism AccuracyMost Common Family Type Among StudentsExponential MechanismCompositionPostprocessing ImmunityImplementing Differentially Private Queries with SmartNoiseExample 1: Differentially Private CountsExample 2: Differentially Private SumExample 3: Multiple Queries from a Single DatabaseSummaryExercises
3. Stable Transformations
Distance MetricsData Set AdjacencyBounded Versus Unbounded Differential PrivacyDefinition of a c-Stable TransformationTransformation: DoubleTransformation: Row-by-RowStability Is a Necessary and Sufficient Condition for SensitivityTransformation: CountTransformation: Unknown-Size SumDomain DescriptorsTransformation: Data ClippingChainingMetric SpacesDefinition of StabilityTransformation: Known-Size SumTransformation: Known-Size MeanTransformation: Unknown-Size MeanTransformation: ResizeRecap of Scalar AggregatorsVector-Valued AggregatorsVector Norm, Distance, and SensitivityAggregating Data with Bounded NormGrouped DataIn PracticeSummaryExercises
4. Private Mechanisms
Privacy MeasurePrivacy Measure: Max-DivergenceMetric Versus Divergence Versus Privacy MeasurePrivate MechanismsRandomized ResponseThe Vector Laplace MechanismExponential MechanismQuantile Score TransformationReport Noisy Max MechanismsInteractivityAbove ThresholdStreamsOnline Private SelectionStable Transformations on StreamsSummaryExercises
5. Definitions of Privacy
The Privacy Loss Random VariableApproximate Differential PrivacyTruncated Noise MechanismsPropose-Test-Release(Advanced) CompositionThe Gaussian MechanismRényi Differential PrivacyZero-Concentrated Differential Privacy (zCDP)Strength of Moments-Based Privacy MeasuresBounded RangePrivacy Loss DistributionsNumerical CompositionCharacteristic FunctionsHypothesis Testing Interpretationf-differential privacySummaryExercises
6. Fearless Combinators
ChainingExample: Bounds EstimationExample: B-TreePrivacy Measure ConversionCompositionAdaptivityOdometers and FiltersPartitioned DataExample: Grouping on Asylum Seeker DataParallel CompositionExample: Multi-QuantilesPrivacy AmplificationPrivacy Amplification by Simple Random SamplingPrivacy Amplification by Poisson SamplingPrivacy Amplification by ShufflingSample and AggregatePrivate Selection from Private CandidatesExample: k-MeansSummaryExercises
II. Differential Privacy in Practice
7. Eyes on the Privacy Unit
Levels of PrivacyUser-Level Privacy in PracticeBrowser Logs Example: A Naive Event-Level GuaranteeData Sets with Unbounded ContributionsStatistics with Constant SensitivityData Set TruncationReservoir SamplingTruncation on Partitioned DataHospital Visits Example: A Bias-Variance Trade-OffPrivately Estimating the Truncation ThresholdFurther Analysis with Unbounded ContributionsUnknown DomainWhen to Apply TruncationStable Grouping TransformationsStable Union TransformationsStable Join TransformationsSummaryExercises

8. Differentially Private Statistical Modeling
Private InferenceDifferentially Private Linear RegressionSufficient Statistics PerturbationPrivate Theil-Sen EstimatorObjective Function PerturbationAlgorithm SelectionDifferentially Private Naive BayesCategorical Naive BayesContinuous Naive BayesMechanism DesignExample: Naive BayesTree-Based AlgorithmsSummaryExercises
9. Differentially Private Machine Learning
Why Make Machine Learning Models Differentially Private?Machine Learning Terminology RecapDifferentially Private Gradient Descent (DP-GD)Example: Minimum Viable DP-GDStochastic Batching (DP-SGD)Parallel CompositionPrivacy Amplification by SubsamplingHyperparameter TuningPrivate Aggregations of Teacher EnsemblesTraining Differentially Private Models with PyTorchExample: Predicting Income PrivatelySummaryExercises
10. Differentially Private Synthetic Data
Defining Synthetic DataTypes of Synthetic DataPractical Scenarios for Synthetic Data UsageMarginal-Based SynthesizersMultiplicative Weights Update Rule with the Exponential MechanismGraphical ModelsPrivBayesGAN SynthesizersPotential ProblemsSummaryExercises
III. Deploying Differential Privacy
11. Protecting Your Data Against Privacy Attacks
Definition of a Privacy ViolationAttacks on Tabular Data SetsRecord LinkageSingling OutDifferencing AttackReconstruction via Systems of EquationsTracingk-Anonymity VulnerabilitiesAttacks on Machine LearningSummaryExercises
12. Defining Privacy Loss Parameters of a Data Release
SamplingMetadata ParametersAllocating Privacy Loss BudgetPractices That Aid Decision-MakingCodebook and Data AnnotationTranslating Contextual Norms into ParametersMaking These Decisions in the Context of Exploratory Data AnalysisAdaptively Choosing Privacy ParametersPotential (Unexpected) Consequences of Transparent Parameter SelectionSummaryExercises
13. Planning Your First DP Project
DP Deployment ConsiderationsFrequency of DP DeploymentsComposition and Budget AccountabilityDP Deployment ChecklistAn Example Project: Back to the ClassroomProper Real-World Data PublicationsLinkedIn’s Economic GraphMicrosoft’s Broadband DataDP Release Table: A Standard for Releasing Details About Your ReleaseThat’s All, Folks
Further Reading
TheoryApplications
A. Supplementary Definitions
B. Rényi Differential Privacy
Theorem: RDP Is Immune to PostprocessingProofTheorem: Young’s InequalityProof via CalculusElementary ProofTheorem: Holder’s InequalityProofTheorem: Probability PreservationProofTheorem: RDP to (ϵ,δ)-DPProof
C. The Exponential Mechanism Satisfies Bounded Range
Proof
D. Structured Query Language (SQL)
E. Composition Proofs
Theorem: Basic Sequential CompositionProofTheorem: General Sequential CompositionProofTheorem: Parallel CompositionProofTheorem: Immunity to PostprocessingProof
F. Machine Learning
Supervised Versus Unsupervised LearningGradient DescentUsing Gradient Descent to Learn ParametersStochastic Gradient Descent
G. Where to Find Solutions
Index
About the Authors

Content preview from Hands-On Differential Privacy

Chapter 9. Differentially Private Machine Learning

Machine learning (ML) is the process of learning relationships and patterns in a data set. Statistical modeling, as discussed in Chapter 8, places greater emphasis on model interpretability. This difference happens to form a natural division in DP techniques.

ML model parameters can leak information about the training data, just as they can in statistical modeling. When you privately train a model, your goal is to release parameters/weights for the model that accurately capture the relationship between variables while protecting your sensitive data with the guarantees of differential privacy.

In this chapter, you will learn about a variety of techniques that are typically used to privately train ML models. Stochastic gradient descent (SGD) is a focal point, as it is the workhorse of non-DP ML training.

The content of this chapter assumes both a working knowledge of non-DP ML and relies heavily on concepts introduced in previous chapters: Chapters 3, 4, 5, and 6. While this may seem daunting, the chapter will start with a more approachable minimum viable DP-SGD before gradually mixing in more advanced tools.

The chapter ends with a discussion and examples of frameworks and tools that will help you create DP ML models. Before diving in, we’ll first motivate the use of DP in this domain by discussing privacy attacks.

Why Make Machine Learning Models Differentially Private?

Suppose you are running a company that sells online educational ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492097730Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Hands-On Differential Privacy

by Ethan Cowan, Michael Shoemate, Mayana Pereira

Chapter 9. Differentially Private Machine Learning

Why Make Machine Learning Models Differentially Private?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.