Hands-On Differential Privacy

Book description

Many organizations today analyze and share large, sensitive datasets about individuals. Whether these datasets cover healthcare details, financial records, or exam scores, it's become more difficult for organizations to protect an individual's information through deidentification, anonymization, and other traditional statistical disclosure limitation techniques. This practical book explains how differential privacy (DP) can help.

Authors Ethan Cowan, Michael Shoemate, and Mayana Pereira and explain how these techniques enable data scientists, researchers, and programmers to run statistical analyses that hide the contribution of any single individual. You'll dive into basic DP concepts and understand how to use open source tools to create differentially private statistics, explore how to assess the utility/privacy trade-offs, and learn how to integrate differential privacy into workflows.

With this book, you'll learn:

  • How DP guarantees privacy when other data anonymization methods don't
  • What preserving individual privacy in a dataset entails
  • How to apply DP in several real-world scenarios and datasets
  • Potential privacy attack methods, including what it means to perform a reidentification attack
  • How to use the OpenDP library in privacy-preserving data releases
  • How to interpret guarantees provided by specific DP data releases

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. The Structure of this Book
      1. Part 1: Differential Privacy Concepts
      2. Part 2: Differential Privacy in Practice
      3. Part 3: Deploying Differential Privacy
    2. Conventions Used in This Book
    3. Using Code Examples
    4. O’Reilly Online Learning
    5. How to Contact Us
    6. Acknowledgments
  2. 1. Welcome to Differential Privacy
    1. History
    2. Privatization Before Differential Privacy
    3. Case Study: Applying DP in a Classroom
      1. Privacy and the Mean
      2. How Could This be Prevented?
    4. Adjacent Data Sets: What if Someone Else Had Dropped the Class?
    5. Sensitivity: How Much can the Statistic Change?
    6. Adding Noise
      1. What Is a Trusted Curator?
    7. Available Tools
    8. Summary
    9. Exercises
  3. 2. Differential Privacy Fundamentals
    1. Intuitive Privacy
      1. Privacy Unit
      2. Privacy Loss
    2. Formalizing the Concept of Differential Privacy
      1. Randomized Response
      2. Privacy Violation
    3. Models of Differential Privacy
    4. Sensitivity
    5. Differentially Private Mechanisms
      1. Laplace Mechanism
      2. The Laplace Mechanism is ϵ -DP
      3. Mechanism Accuracy
      4. Most Common Family Type Among Students
      5. Exponential Mechanism
    6. Composition
    7. Post-Processing Immunity
    8. Implementing Differentially Private Queries with SmartNoise
      1. Example 1: Differentially Private Counts
      2. Example 2: Differentially Private Sum
      3. Example 3: Multiple Queries from a Single Database
    9. Summary
    10. Exercises
  4. 3. Stable Transformations
    1. Distance Metrics
      1. Data Set Adjacency
      2. Bounded vs. Unbounded Differential Privacy
    2. Definition of a c-Stable Transformation
      1. Transformation: Double
      2. Transformation: Row-by-row
    3. Stability is a Necessary and Sufficient Condition for Sensitivity
      1. Transformation: Count
      2. Transformation: Unknown-Size Sum
    4. Domain Descriptors
      1. Transformation: Data Clipping
    5. Chaining
    6. Metric Spaces
    7. Definition of Stability
      1. Transformation: Known-Size Sum
      2. Transformation: Known-Size Mean
      3. Transformation: Unknown-Size Mean
      4. Transformation: Resize
      5. Recap of Scalar Aggregators
    8. Vector-Valued Aggregators
      1. Vector Norm, Distance and Sensitivity
      2. Aggregating Data with Bounded Norm
      3. Grouped Data
    9. In Practice
    10. Summary
    11. Exercises
  5. 4. Private Mechanisms
    1. Privacy Measure
      1. Privacy Measure: Max-Divergence
      2. Metric vs. Divergence vs. Privacy Measure
    2. Private Mechanisms
      1. Randomized Response
      2. The Vector Laplace Mechanism
      3. Exponential Mechanism
      4. Quantile Score Transformation
      5. Report Noisy Max Mechanisms
    3. Interactivity
    4. Above Threshold
      1. Streams
      2. Online Private Selection
      3. Stable Transformations on Streams
    5. Summary
    6. Exercises
  6. 5. Definitions of Privacy
    1. The Privacy Loss Random Variable
    2. Approximate Differential Privacy
      1. Truncated Noise Mechanisms
      2. Propose-Test-Release
      3. (Advanced) Composition
    3. The Gaussian Mechanism
    4. Rényi Differential Privacy
      1. zero-Concentrated Differential Privacy (zCDP)
      2. Strength of Moments-Based Privacy Measures
    5. Bounded Range
    6. Privacy Loss Distributions
      1. Numerical Composition
      2. Characteristic Functions
    7. Hypothesis Testing Interpretation
      1. f -differential privacy
    8. Summary
    9. Exercises
  7. 6. Fearless Combinators
    1. Chaining
      1. Example: Bounds Estimation
      2. Example: B-Tree
    2. Privacy Measure Conversion
    3. Composition
      1. Adaptivity
      2. Odometers and Filters
    4. Partitioned Data
      1. Example: Grouping on Asylum Seeker Data
      2. Parallel Composition
      3. Example: Multi-Quantiles
    5. Privacy Amplification
      1. Privacy Amplification by Simple Random Sampling
      2. Privacy Amplification by Poisson Sampling
      3. Privacy Amplification by Shuffling
    6. Sample and Aggregate
    7. Private Selection from Private Candidates
      1. Example: K-Means
    8. Summary
    9. Exercises
  8. 7. Eyes on the Privacy Unit
    1. Levels of Privacy
      1. User-Level Privacy in Practice
    2. Browser Logs Example: A Naive Event-level Guarantee
    3. Data Sets with Unbounded Contributions
      1. Statistics with Constant Sensitivity
    4. Data Set Truncation
      1. Truncation on Partitioned Data
      2. Hospital Visits Example: A Bias-Variance Tradeoff
    5. Privately Estimating the Truncation Threshold
      1. Further Analysis with Unbounded Contributions
    6. Unknown Domain
    7. When to Apply Truncation
      1. Stable Grouping Transformations
      2. Stable Union Transformation
      3. Stable Join Transformations
    8. Summary
    9. Exercises
  9. 8. Differentially Private Statistical Modeling
    1. Private Inference
    2. Differentially Private Linear Regression
      1. Sufficient Statistics Perturbation
      2. Private Theil-Sen Estimator
      3. Objective Function Perturbation
    3. Algorithm Selection
    4. Differentially Private Naive Bayes
      1. Categorical Naive Bayes
      2. Continuous Naive Bayes
      3. Mechanism Design
      4. Example: Naive Bayes
    5. Tree-based Algorithms
    6. Summary
    7. Exercises
  10. 9. Differentially Private Machine Learning
    1. Why Make Machine Learning Models Differentially Private?
    2. Machine Learning Terminology Recap
    3. Differentially Private Gradient Descent (DP-GD)
      1. Example: Minimum Viable DP-GD
    4. Stochastic Batching (DP-SGD)
      1. Parallel Composition
      2. Privacy Amplification by Subsampling
      3. Hyperparameter Tuning
    5. Private Aggregations of Teacher Ensembles
    6. Training Differentially Private Models with Pytorch
      1. Example: Predicting Income Privately
    7. Summary
    8. Exercises
  11. 10. Differentially Private Synthetic Data
    1. Defining Synthetic Data
      1. Types of Synthetic Data
    2. Practical Scenarios for Synthetic Data Usage
    3. Marginal-Based Synthesizers
      1. Multiplicative Weights update rule with the Exponential Mechanism
    4. Graphical Models
      1. PrivBayes
    5. GAN Synthesizers
      1. Potential Problems
    6. Summary
    7. Exercises
  12. 11. Protecting Your Data Against Privacy Attacks
    1. Definition of a Privacy Violation
    2. Attacks on Tabular Data Sets
      1. Record Linkage
      2. Singling Out
      3. Differencing Attack
      4. Least Squares Solution
      5. Tracing
      6. K-anonymity Vulnerabilities
    3. Attacks on Machine Learning
    4. Summary
    5. Exercises
  13. 12. Defining Privacy Loss Parameters of a Data Release
    1. Sampling
    2. Metadata Parameters
    3. Allocating Privacy Loss Budget
    4. Practices that Aid Decision-Making
      1. Codebook and Data Annotation
      2. Translating Contextual Norms into Parameters
    5. Making These Decisions in the Context of Exploratory Data Analysis
    6. Adaptively Choosing Privacy Parameters
    7. Potential (Unexpected) Consequences of Transparent Parameter Selection
    8. Summary
    9. Exercises
  14. 13. Planning Your First DP Project
    1. DP Deployment Considerations
      1. Frequency of DP Deployments
      2. Composition and Budget Accountability
    2. DP Deployment Checklist
    3. An Example Project: Back to the Classroom
    4. Proper Real World Data Publications
      1. LinkedIn’s Economic Graph
      2. Microsoft’s Broadband Data
    5. DP Release Table: A Standard for Releasing Details About Your Release
    6. That’s All, Folks
  15. Further Reading
    1. Theory
    2. Applications
  16. A. Supplementary Definitions
  17. B. Renyi Differential Privacy
    1. Theorem: Renyi DP is immmune to post-processing
      1. Proof
    2. Theorem: Young’s Inequality
      1. Proof via Calculus
    3. Elementary Proof
    4. Theorem: Holder’s Inequality
      1. Proof
    5. Theorem: Probability Preservation
      1. Proof
    6. Theorem: RDP to ( ϵ , δ ) -DP
      1. Proof
  18. C. The Exponential Mechanism Satisfies Bounded Range
    1. Proof
  19. D. Structured Query Language (SQL)
  20. E. Composition Proofs
    1. Theorem: Basic Sequential Composition
      1. Proof
    2. General sequential composition
    3. Theorem
      1. Proof
    4. Theorem: Parallel Composition
      1. Proof
    5. Theorem: Immunity to post-processing
      1. Proof
  21. F. Machine Learning
    1. Supervised vs. Unsupervised Learning
    2. Gradient Descent
      1. Using Gradient Descent to Learn Parameters
      2. Stochastic Gradient Descent
  22. G. Where to Find Solutions
  23. About the Authors

Product information

  • Title: Hands-On Differential Privacy
  • Author(s): Ethan Cowan, Michael Shoemate, Mayana Pereira
  • Release date: May 2024
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492097747