Privacy-Preserving Machine Learning

Book description

Keep sensitive user data safe and secure without sacrificing the performance and accuracy of your machine learning models.

In Privacy Preserving Machine Learning, you will learn:

  • Privacy considerations in machine learning
  • Differential privacy techniques for machine learning
  • Privacy-preserving synthetic data generation
  • Privacy-enhancing technologies for data mining and database applications
  • Compressive privacy for machine learning

Privacy Preserving Machine Learning is a comprehensive guide to avoiding data breaches in your machine learning projects. You’ll get to grips with modern privacy-enhancing techniques such as differential privacy, compressive privacy, and synthetic data generation. Based on years of DARPA-funded cybersecurity research, ML engineers of all skill levels will benefit from incorporating these privacy-preserving practices into their model development. By the time you’re done reading, you’ll be able to create machine learning systems that preserve user privacy without sacrificing data quality and model performance.

About the Technology
Machine learning applications need massive amounts of data. It’s up to you to keep the sensitive information in those data sets private and secure. Privacy preservation happens at every point in the ML process, from data collection and ingestion to model development and deployment. This practical book teaches you the skills you’ll need to secure your data pipelines end to end.

About the Book
Privacy Preserving Machine Learning explores privacy preservation techniques through real-world use cases in facial recognition, cloud data storage, and more. You’ll learn about practical implementations you can deploy now, future privacy challenges, and how to adapt existing technologies to your needs. Your new skills build towards a complete security data platform project you’ll develop in the final chapter.

What's Inside
  • Differential and compressive privacy techniques
  • Privacy for frequency or mean estimation, naive Bayes classifier, and deep learning
  • Privacy-preserving synthetic data generation
  • Enhanced privacy for data mining and database applications


About the Reader
For machine learning engineers and developers. Examples in Python and Java.

About the Authors
J. Morris Chang is a professor at the University of South Florida. His research projects have been funded by DARPA and the DoD. Di Zhuang is a security engineer at Snap Inc. G. Dumindu Samaraweera is an assistant research professor at the University of South Florida. The technical editor for this book, Wilko Henecka, is a senior software engineer at Ambiata where he builds privacy-preserving software.

Quotes
A detailed treatment of differential privacy, synthetic data generation, and privacy-preserving machine-learning techniques with relevant Python examples. Highly recommended!
- Abe Taha, Google

A wonderful synthesis of theoretical and practical. This book fills a real need.
- Stephen Oates, Allianz

The definitive source for creating privacy-respecting machine learning systems. This area in data-rich environments is so important to understand!
- Mac Chambers, Roy Hobbs Diamond Enterprises

Covers all aspects for data privacy, with good practical examples.
- Vidhya Vinay, Streamingo Solutions

Publisher resources

Download Example Code

Table of contents

  1. inside front cover
  2. Privacy-Preserving Machine Learning
  3. Copyright
  4. contents
  5. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized: A road map
      3. About the code
      4. liveBook discussion forum
    4. about the authors
    5. about the cover illustration
  6. Part 1 Basics of privacy-preserving machine learning with differential privacy
  7. 1 Privacy considerations in machine learning
    1. 1.1 Privacy complications in the AI era
    2. 1.2 The threat of learning beyond the intended purpose
      1. 1.2.1 Use of private data on the fly
      2. 1.2.2 How data is processed inside ML algorithms
      3. 1.2.3 Why privacy protection in ML is important
      4. 1.2.4 Regulatory requirements and the utility vs. privacy tradeoff
    3. 1.3 Threats and attacks for ML systems
      1. 1.3.1 The problem of private data in the clear
      2. 1.3.2 Reconstruction attacks
      3. 1.3.3 Model inversion attacks
      4. 1.3.4 Membership inference attacks
      5. 1.3.5 De-anonymization or re-identification attacks
      6. 1.3.6 Challenges of privacy protection in big data analytics
    4. 1.4 Securing privacy while learning from data: Privacy-preserving machine learning
      1. 1.4.1 Use of differential privacy
      2. 1.4.2 Local differential privacy
      3. 1.4.3 Privacy-preserving synthetic data generation
      4. 1.4.4 Privacy-preserving data mining techniques
      5. 1.4.5 Compressive privacy
    5. 1.5 How is this book structured?
    6. Summary
  8. 2 Differential privacy for machine learning
    1. 2.1 What is differential privacy?
      1. 2.1.1 The concept of differential privacy
      2. 2.1.2 How differential privacy works
    2. 2.2 Mechanisms of differential privacy
      1. 2.2.1 Binary mechanism (randomized response)
      2. 2.2.2 Laplace mechanism
      3. 2.2.3 Exponential mechanism
    3. 2.3 Properties of differential privacy
      1. 2.3.1 Postprocessing property of differential privacy
      2. 2.3.2 Group privacy property of differential privacy
      3. 2.3.3 Composition properties of differential privacy
    4. Summary
  9. 3 Advanced concepts of differential privacy for machine learning
    1. 3.1 Applying differential privacy in machine learning
      1. 3.1.1 Input perturbation
      2. 3.1.2 Algorithm perturbation
      3. 3.1.3 Output perturbation
      4. 3.1.4 Objective perturbation
    2. 3.2 Differentially private supervised learning algorithms
      1. 3.2.1 Differentially private naive Bayes classification
      2. 3.2.2 Differentially private logistic regression
      3. 3.2.3 Differentially private linear regression
    3. 3.3 Differentially private unsupervised learning algorithms
      1. 3.3.1 Differentially private k-means clustering
    4. 3.4 Case study: Differentially private principal component analysis
      1. 3.4.1 The privacy of PCA over horizontally partitioned data
      2. 3.4.2 Designing differentially private PCA over horizontally partitioned data
      3. 3.4.3 Experimentally evaluating the performance of the protocol
    5. Summary
  10. Part 2 Local differential privacy and synthetic data generation
  11. 4 Local differential privacy for machine learning
    1. 4.1 What is local differential privacy?
      1. 4.1.1 The concept of local differential privacy
      2. 4.1.2 Randomized response for local differential privacy
    2. 4.2 The mechanisms of local differential privacy
      1. 4.2.1 Direct encoding
      2. 4.2.2 Histogram encoding
      3. 4.2.3 Unary encoding
    3. Summary
  12. 5 Advanced LDP mechanisms for machine learning
    1. 5.1 A quick recap of local differential privacy
    2. 5.2 Advanced LDP mechanisms
      1. 5.2.1 The Laplace mechanism for LDP
      2. 5.2.2 Duchi’s mechanism for LDP
      3. 5.2.3 The Piecewise mechanism for LDP
    3. 5.3 A case study implementing LDP naive Bayes classification
      1. 5.3.1 Using naive Bayes with ML classification
      2. 5.3.2 Using LDP naive Bayes with discrete features
      3. 5.3.3 Using LDP naive Bayes with continuous features
      4. 5.3.4 Evaluating the performance of different LDP protocols
    4. Summary
  13. 6 Privacy-preserving synthetic data generation
    1. 6.1 Overview of synthetic data generation
      1. 6.1.1 What is synthetic data? Why is it important?
      2. 6.1.2 Application aspects of using synthetic data for privacy preservation
      3. 6.1.3 Generating synthetic data
    2. 6.2 Assuring privacy via data anonymization
      1. 6.2.1 Private information sharing vs. privacy concerns
      2. 6.2.2 Using k-anonymity against re-identification attacks
      3. 6.2.3 Anonymization beyond k-anonymity
    3. 6.3 DP for privacy-preserving synthetic data generation
      1. 6.3.1 DP synthetic histogram representation generation
      2. 6.3.2 DP synthetic tabular data generation
      3. 6.3.3 DP synthetic multi-marginal data generation
    4. 6.4 Case study on private synthetic data release via feature-level micro-aggregation
      1. 6.4.1 Using hierarchical clustering and micro-aggregation
      2. 6.4.2 Generating synthetic data
      3. 6.4.3 Evaluating the performance of the generated synthetic data
    5. Summary
  14. Part 3 Building privacy-assured machine learning applications
  15. 7 Privacy-preserving data mining techniques
    1. 7.1 The importance of privacy preservation in data mining and management
    2. 7.2 Privacy protection in data processing and mining
      1. 7.2.1 What is data mining and how is it used?
      2. 7.2.2 Consequences of privacy regulatory requirements
    3. 7.3 Protecting privacy by modifying the input
      1. 7.3.1 Applications and limitations
    4. 7.4 Protecting privacy when publishing data
      1. 7.4.1 Implementing data sanitization operations in Python
      2. 7.4.2 k-anonymity
      3. 7.4.3 Implementing k-anonymity in Python
    5. Summary
  16. 8 Privacy-preserving data management and operations
    1. 8.1 A quick recap of privacy protection in data processing and mining
    2. 8.2 Privacy protection beyond k-anonymity
      1. 8.2.1 l-diversity
      2. 8.2.2 t-closeness
      3. 8.2.3 Implementing privacy models with Python
    3. 8.3 Protecting privacy by modifying the data mining output
      1. 8.3.1 Association rule hiding
      2. 8.3.2 Reducing the accuracy of data mining operations
      3. 8.3.3 Inference control in statistical databases
    4. 8.4 Privacy protection in data management systems
      1. 8.4.1 Database security and privacy: Threats and vulnerabilities
      2. 8.4.2 How likely is a modern database system to leak private information?
      3. 8.4.3 Attacks on database systems
      4. 8.4.4 Privacy-preserving techniques in statistical database systems
      5. 8.4.5 What to consider when designing a customizable privacy-preserving database system
    5. Summary
  17. 9 Compressive privacy for machine learning
    1. 9.1 Introducing compressive privacy
    2. 9.2 The mechanisms of compressive privacy
      1. 9.2.1 Principal component analysis (PCA)
      2. 9.2.2 Other dimensionality reduction methods
    3. 9.3 Using compressive privacy for ML applications
      1. 9.3.1 Implementing compressive privacy
      2. 9.3.2 The accuracy of the utility task
      3. 9.3.3 The effect of ρ' in DCA for privacy and utility
    4. 9.4 Case study: Privacy-preserving PCA and DCA on horizontally partitioned data
      1. 9.4.1 Achieving privacy preservation on horizontally partitioned data
      2. 9.4.2 Recapping dimensionality reduction approaches
      3. 9.4.3 Using additive homomorphic encryption
      4. 9.4.4 Overview of the proposed approach
      5. 9.4.5 How privacy-preserving computation works
      6. 9.4.6 Evaluating the efficiency and accuracy of the privacy-preserving PCA and DCA
    5. Summary
  18. 10 Putting it all together: Designing a privacy-enhanced platform (DataHub)
    1. 10.1 The significance of a research data protection and sharing platform
      1. 10.1.1 The motivation behind the DataHub platform
      2. 10.1.2 DataHub’s important features
    2. 10.2 Understanding the research collaboration workspace
      1. 10.2.1 The architectural design
      2. 10.2.2 Blending different trust models
      3. 10.2.3 Configuring access control mechanisms
    3. 10.3 Integrating privacy and security technologies into DataHub
      1. 10.3.1 Data storage with a cloud-based secure NoSQL database
      2. 10.3.2 Privacy-preserving data collection with local differential privacy
      3. 10.3.3 Privacy-preserving machine learning
      4. 10.3.4 Privacy-preserving query processing
      5. 10.3.5 Using synthetic data generation in the DataHub platform
    4. Summary
  19. Appendix A. More details about differential privacy
    1. A.1 The formal definition of differential privacy
    2. A.2 Other differential privacy mechanisms
      1. A.2.1 Geometric mechanism
      2. A.2.2 Gaussian mechanism
      3. A.2.3 Staircase mechanism
      4. A.2.4 Vector mechanism
      5. A.2.5 Wishart mechanism
    3. A.3 Formal definitions of composition properties of DP
      1. A.3.1 The formal definition of sequential composition DP
      2. A.3.2 The formal definition of parallel composition DP
  20. references
    1. Appendix
  21. index
  22. inside back cover

Product information

  • Title: Privacy-Preserving Machine Learning
  • Author(s): Di Zhuang, Morris Chang, Dumindu Samaraweera
  • Release date: May 2023
  • Publisher(s): Manning Publications
  • ISBN: 9781617298042