Chapter 17. Privacy and Legal Requirements
Data privacy is becoming an important part of ML projects. There’s an increasing push toward ethical AI and a growing number of legal requirements around data privacy. Many of the predictions made by ML models are based on personal data collected from users, so it’s important to have an awareness of strategies to increase privacy in ML pipelines, as well as some knowledge of the laws and regulations in this area.
Before you even start building your ML pipelines, it’s essential to be transparent with your users about what data you are collecting. You should ensure that you have consent from your users to use their data. And you should also minimize data collection to what’s necessary to train your models. Once you have these fundamental principles in place, you can look at the privacy-preserving ML options we describe in this chapter to provide even greater privacy for your users.
At the time of this writing, there is always a cost to privacy: increasing privacy for our users comes with a cost in model accuracy, computation time, or both. At one extreme, collecting no data keeps an interaction completely private but is completely useless for ML. At the other extreme, knowing all the details about a person might endanger that person’s privacy, but it allows us to make very accurate ML models. We’re starting to see the development of privacy-preserving ML, in which privacy can be increased without such a large ...
Get Machine Learning Production Systems now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.