AI & ML Business Data Innovation Research Security

Try the O’Reilly learning platform

With the O’Reilly learning platform, you get the resources and guidance to keep your skills sharp and stay ahead. Try it free for up to 14 days.

Start trial

Try a course for free

Join a live online event on the O’Reilly platform to learn from the experts shaping tech.

See what’s coming soon

Get the Radar Trends newsletter

Your email

Country

Please read our privacy policy.

Radar > Topics > AI & ML

How privacy-preserving techniques can lead to more robust machine learning models

The O’Reilly Data Show Podcast: Chang Liu on operations research, and the interplay between differential privacy and machine learning.

By Ben Lorica August 2, 2018 • 00:36:43 listen

LinkedIn X Facebook Threads Bluesky Reddit

O'Reilly Data Show Podcast

How privacy-preserving techniques can lead to more robust machine learning models

00:00 / 00:36:43

In this episode of the Data Show, I spoke with Chang Liu, applied research scientist at Georgian Partners. In a previous post, I highlighted early tools for privacy-preserving analytics, both for improving decision-making (business intelligence and analytics) and for enabling automation (machine learning). One of the tools I mentioned is an open source project for SQL-based analysis that adheres to state-of-the-art differential privacy (a formal guarantee that provides robust privacy assurances). Since business intelligence typically relies on SQL databases, this open source project is something many companies can already benefit from today.

What about machine learning? While I didn’t have space to point this out in my previous post, differential privacy has been an area of interest to many machine learning researchers. Most practicing data scientists aren’t aware of the research results, and popular data science tools haven’t incorporated differential privacy in meaningful ways (if at all). But things will change over the next months. For example, Liu wants to make ideas from differential privacy accessible to industrial data scientists, and she is part of a team building tools to make this happen.

Here are some highlights from our conversation:

Differential privacy and machine learning

In the literature, there are actually multiple ways differential privacy is used in machine learning. We can either inject noise directly at the input data level, or while we’re training a model. We can also inject noise into the gradient. At every iteration we’re computing the gradients, we can inject some sort of noise. Or we can also inject noise during aggregation. If we’re using ensembles, we can inject noise there. And we can also inject noise at the output level. So after we’ve trained the model, and we have our vectors of weights, then we can also inject noise directly to the weights.

A mechanism for building robust models

There could be a chance that differential privacy methods can actually make your model more general. Because, essentially, when models memorize their training data, it could be due to overfitting. So, injecting all of this noise may help the resulting model move you further away from overfitting, and you get a more general model.

Related resources:

“How to build analytic products in an age when data privacy has become critical”
“Managing risk in machine learning models”: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.
“Data regulations and privacy discussions are still in the early stages”: Aurélie Pols on GDPR, ethics, and ePrivacy.
“Data collection and data markets in the age of privacy and machine learning”

Post topics: AI & ML•Data•O'Reilly Data Show

Post tags: Podcast

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Try the O’Reilly learning platform

Try a course for free

Get the Radar Trends newsletter

Thank you for subscribing to the O’Reilly Radar Trends to Watch newsletter.

How privacy-preserving techniques can lead to more robust machine learning models

Differential privacy and machine learning

A mechanism for building robust models