Practical Data Privacy

Book description

Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems.

Practical Data Privacy answers important questions such as:

  • What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases?
  • What does "anonymized data" really mean? How do I actually anonymize data?
  • How does federated learning and analysis work?
  • Homomorphic encryption sounds great, but is it ready for use?
  • How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help?
  • How do I ensure that my data science projects are secure by default and private by design?
  • How do I work with governance and infosec teams to implement internal policies appropriately?

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. What Is Data Privacy?
    2. Who Should Read This Book
      1. Privacy Engineering
    3. Why I Wrote This Book
    4. Navigating This Book
    5. Conventions Used in This Book
    6. Using Code Examples
    7. O’Reilly Online Learning
    8. Acknowledgments
  3. 1. Data Governance and Simple Privacy Approaches
    1. Data Governance: What Is It?
    2. Identifying Sensitive Data
      1. Identifying PII
    3. Documenting Data for Use
      1. Basic Data Documentation
      2. Finding and Documenting Unknown Data
      3. Tracking Data Lineage
      4. Data Version Control
    4. Basic Privacy: Pseudonymization for Privacy by Design
    5. Summary
  4. 2. Anonymization
    1. What Is Anonymization?
    2. Defining Differential Privacy
    3. Understanding Epsilon: What Is Privacy Loss?
    4. What Differential Privacy Guarantees, and What It Doesn’t
    5. Understanding Differential Privacy
      1. Differential Privacy in Practice: Anonymizing the US Census
    6. Differential Privacy with the Laplace Mechanism
      1. Differential Privacy with Laplace: A Naive Attempt
      2. Sensitivity and Error
      3. Privacy Budgets and Composition
    7. Exploring Other Mechanisms: Gaussian Noise for Differential Privacy
      1. Comparing Laplace and Gaussian Noise
      2. Real-World Differential Privacy: Debiasing Noisy Results
    8. Sensitivity and Privacy Units
    9. What About k-Anonymity?
    10. Summary
  5. 3. Building Privacy into Data Pipelines
    1. How to Build Privacy into Data Pipelines
      1. Design Appropriate Privacy Measures
      2. Meet Users Where They Are
      3. Engineer Privacy In
      4. Test and Verify
    2. Engineering Privacy and Data Governance into Pipelines
      1. An Example Data Sharing Workflow
      2. Adding Provenance and Consent Information to Collection
    3. Using Differential Privacy Libraries in Pipelines
    4. Collecting Data Anonymously
      1. Apple’s Differentially Private Data Collection
      2. Why Chrome’s Original Differential Privacy Collection Died
    5. Working with Data Engineering Team and Leadership
      1. Share Responsibility
      2. Create Workflows with Documentation and Privacy
      3. Privacy as a Core Value Proposition
    6. Summary
  6. 4. Privacy Attacks
    1. Privacy Attacks: Analyzing Common Attack Vectors
      1. Netflix Prize Attack
      2. Linkage Attacks
      3. Singling Out Attacks
      4. Strava Heat Map Attack
      5. Membership Inference Attack
      6. Inferring Sensitive Attributes
      7. Other Model Leakage Attacks: Memorization
      8. Model-Stealing Attacks
      9. Attacks Against Privacy Protocols
    2. Data Security
      1. Access Control
      2. Data Loss Prevention
      3. Extra Security Controls
      4. Threat Modeling and Incident Response
    3. Probabilistic Reasoning About Attacks
      1. An Average Attacker
      2. Measuring Risk, Assessing Threats
    4. Data Security Mitigations
      1. Applying Web Security Basics
      2. Protecting Training Data and Models
      3. Staying Informed: Learning About New Attacks
    5. Summary
  7. 5. Privacy-Aware Machine Learning and Data Science
    1. Using Privacy-Preserving Techniques in Machine Learning
      1. Privacy-Preserving Techniques in a Typical Data Science or ML Workflow
      2. Privacy-Preserving Machine Learning in the Wild
      3. Differentially Private Stochastic Gradient Descent
    2. Open Source Libraries for PPML
      1. Engineering Differentially Private Features
      2. Applying Simpler Methods
      3. Documenting Your Machine Learning
      4. Other Ways of Protecting Privacy in Machine Learning
    3. Architecting Privacy in Data and Machine Learning Projects
      1. Understanding Your Data Privacy Needs
      2. Monitoring Privacy
    4. Summary
  8. 6. Federated Learning and Data Science
    1. Distributed Data
      1. Why Use Distributed Data?
      2. How Does Distributed Data Analysis Work?
      3. Privacy-Charging Distributed Data with Differential Privacy
    2. Federated Learning
      1. Federated Learning: A Brief History
      2. Why, When, and How to Use Federated Learning
    3. Architecting Federated Systems
      1. Example Deployment
      2. Security Threats
      3. Use Cases
      4. Deploying Federated Libraries and Tools
    4. Open Source Federated Libraries
      1. Flower: Unified OSS for Federated Learning Libraries
    5. A Federated Data Science Future Outlook
    6. Summary
  9. 7. Encrypted Computation
    1. What Is Encrypted Computation?
    2. When to Use Encrypted Computation
      1. Privacy Versus Secrecy
      2. Threat Modeling
    3. Types of Encrypted Computation
      1. Secure Multiparty Computation
      2. Homomorphic Encryption
    4. Real-World Encrypted Computation
      1. Private Set Intersection
      2. Private Join and Compute
      3. Secure Aggregation
      4. Encrypted Machine Learning
    5. Getting Started with PSI and Moose
    6. Imagining a World with Secure Data Sharing
    7. Summary
  10. 8. Navigating the Legal Side of Privacy
    1. GDPR: An Overview
      1. Fundamental Data Rights Under GDPR
      2. Data Controller Versus Data Processor
      3. Applying Privacy-Enhancing Technologies for GDPR
      4. GDPR’s Data Protection Impact Assessment: Agile and Iterative Risk Assessments
      5. Right to an Explanation: Interpretability and Privacy
    2. California Consumer Privacy Act (CCPA)
      1. Applying PETs for CCPA
    3. Other Regulations: HIPAA, LGPD, PIPL, and More!
    4. Internal Policies and Contracts
      1. Reading Privacy Policies and Terms of Service
      2. Reading Data Processing Agreements
      3. Reading Policies, Guidelines, and Contracts
    5. Working with Legal Professionals
      1. Adhering to Contractual Agreements and Contract Law
      2. Interpreting Data Protection Regulations
      3. Asking for Help and Advice
      4. Working Together on Shared Definitions and Ideas
      5. Providing Technical Guidance
    6. Data Governance 2.0
      1. What Is Federated Governance?
      2. Supporting a Culture of Experimentation
      3. Documentation That Works, Platforms with PETs
    7. Summary
  11. 9. Privacy and Practicality Considerations
    1. Getting Practical: Managing Privacy and Security Risk
      1. Evaluating and Managing Privacy Risk
      2. Embracing Uncertainty While Planning for the Future
    2. Practical Privacy Technology: Use-Case Analysis
      1. Federated Marketing: Guiding Marketing Campaigns with Privacy Built In
      2. Public-Private Partnerships: Sharing Data for Public Health
      3. Anonymized Machine Learning: Looking for GDPR Compliance in Iterative Training Settings
      4. Business-to-Business Application: Hands-Off Data
    3. Step-by-Step: How to Integrate and Automate Privacy in ML
      1. Iterative Discovery
      2. Documenting Privacy Requirements
      3. Evaluating and Combining Approaches
      4. Shifting to Automation
      5. Making Privacy Normal
    4. Embracing the Future: Working with Research Libraries and Teams
      1. Working with External Researchers
      2. Investing in Internal Research
    5. Summary
  12. 10. Frequently Asked Questions (and Their Answers!)
    1. Encrypted Computation and Confidential Computing
      1. Is Secure Computation Quantum-Safe?
      2. Can I Use Enclaves to Solve Data Privacy or Data Secrecy Problems?
      3. What If I Need to Protect the Privacy of the Client or User Who Sends the Database Query or Request?
      4. Do Clean Rooms or Remote Data Analysis/Access Solve My Privacy Problem?
      5. I Want to Provide Perfect Privacy or Perfect Secrecy. Is That Possible?
      6. How Do I Determine That an Encrypted Computation Is Secure Enough?
      7. If I Want to Use Encrypted Computation, How Do I Manage Key Rotation?
      8. What Is Google’s Privacy Sandbox? Does It Use Encrypted Computation?
    2. Data Governance and Protection Mechanisms
      1. Why Isn’t k-Anonymity Enough?
      2. I Don’t Think Differential Privacy Works for My Use Case. What Do I Do?
      3. Can I Use Synthetic Data to Solve Privacy Problems?
      4. How Should Data Be Shared Ethically or What Are Alternatives to Selling Data?
      5. How Can I Find All the Private Information That I Need to Protect?
      6. I Dropped the Personal Identifiers, so the Data Is Safe Now, Right?
      7. How Do I Reason About Data I Released in the Past?
      8. I’m Working on a BI Dashboard or Visualization. How Do I Make It Privacy-Friendly?
      9. Who Makes Privacy Engineering Decisions? How Do I Fit Privacy Engineering into My Organization?
      10. What Skills or Background Do I Need to Become a Privacy Engineer?
      11. Why Didn’t You Mention (Insert Technology or Company Here)? How Do I Learn More? Help!
    3. GDPR and Data Protection Regulations
      1. Do I Really Need to Use Differential Privacy to Remove Data from GDPR/CPRA/LGPD/etc. Requirements?
      2. I Heard That I Can Use Personal Data Under GDPR for Legitimate Interest. Is That Correct?
      3. I Want to Comply with Schrems II and Transatlantic Data Flows. What Are Possible Solutions?
    4. Personal Choices and Social Privacy
      1. What Email Provider, Browser, and Application Should I Use if I Care About My Privacy?
      2. My Friend Has an Automated Home or Phone Assistant. I Don’t Want It Listening to Me. What Should I Do?
      3. I Gave Up on Privacy a Long Time Ago. I Have Nothing to Hide. Why Should I Change?
      4. Can I Just Sell My Own Data to Companies?
      5. I Like Personalized Ads. Why Don’t You?
      6. Is (Fill in the Blank) Listening to Me? What Should I Do About It?
    5. Summary
  13. 11. Go Forth and Engineer Privacy!
    1. Surveillance Capitalism and Data Science
      1. Gig Workers and Surveillance at Work
      2. Surveillance for “Security”
      3. Luxury Surveillance
    2. Vast Data Collection and Society
      1. Machine Learning as Data Laundering
      2. Disinformation and Misinformation
    3. Fighting Back
      1. Researching, Documenting, Hacking, Learning
      2. Collectivizing Data
      3. Regulation Fining Back
      4. Supporting Community Work
    4. Privacy Champions
      1. Your Privacy-Aware Multitool
      2. Building Trustworthy Machine Learning Systems
      3. Privacy by Design
      4. Privacy and Power
    5. Tschüss
  14. Index
  15. About the Author

Product information

  • Title: Practical Data Privacy
  • Author(s): Katharine Jarmul
  • Release date: April 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098129460