Chapter 1. Privacy

If you’ve been paying any attention to the media, then you’re at least somewhat aware of the damage that can follow when a company’s customer data or proprietary algorithms are leaked. Given that the field of machine learning (ML) requires enormous amounts of data almost by definition, the risk is especially glaring.

Attack Vectors for Machine Learning Pipelines

Shortly after computers were invented, methods for attacking them were invented. To illustrate this, the MITRE corporation has created a taxonomy of tactics and techniques used by hackers to attack systems.

The emergence of machine learning created a bunch of additional ways in which computer systems could be attacked. In fact, there’s a machine learning–specific version of MITRE ATT&CK: MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems). Just as attackers and adversaries have sought to steal data from and control computer systems in general, machine learning pipelines are faced with the same risks.

This chapter goes into a series of techniques and technologies that can mitigate the risk of privacy leaks. While these techniques represent the intersection of practical best practices and state-of-the-art research, no tool is perfect. Some of these technologies can backfire if not properly implemented or if you focus on only one definition of privacy.

Improperly Implemented Privacy Features in ML: Case Studies

Before we dive into mathematical privacy definitions, let’s first get an understanding of what improperly implemented privacy features look like in the real world and what consequences might arise from them.

A lot of the data privacy laws described are aimed at punishing data leaks. Where laws do not deter people, organizational and technological safeguards are needed. All of these are designed to place an enormous cost on obtaining the data in question. The problem is that for some bad actors, the value of the data still far exceeds the time and monetary costs in obtaining it.

On the consumer side, in China there’s an extensive black market for personal data. Malicious actors can buy mobile phone location and movement data, credit information, academic records, and phone records for as little as $0.01 (though these will fetch higher prices depending on the individual). For a data breach of thousands or millions of individuals, the financial incentive becomes clear. Information like healthcare records typically fetches more. According to Experian, a single patient record can sell for upwards of $1,000 on the black market, depending on how complete the record is; this is nearly 50 times higher than standard credit card records.

There is also a large market for proprietary company information. It’s difficult to quantify the value of having your competitors’ information. In most cases it’s pretty high, especially if the information is the data that their analytics pipeline was trained on or the mission-critical models that were trained over hundreds or thousands of computing hours.

Of course, there’s much more than money to be gained from stealing information. Nation-state actors may have motivations ranging from achieving clear cut national-security objectives to gathering blackmail material to causing destabilization, or even the more vague principle that “It’s better to have data and not need it than to need data and not have it.”

As of this writing, we haven’t seen attacks on machine learning models on a scale comparable to some of the larger data leaks (though comparing yourself favorably to Meta’s breach of 530 million users’ information is a low bar). Part of the reason is that the usual routes of attacks on unsecured frontends and backends are still easy enough to be profitable. If a product or service has removed much of the low-hanging fruit, hackers may turn to attacks on ML models themselves to get what they want.

Case 1: Apple’s CSAM

Apple made headlines in 2021 when it announced a new system for tackling child abuse and child trafficking. The Child Sexual Abuse Material (CSAM) detection system was originally planned for release with iOS 15. The most notable feature of the system was an on-device ML model that would check all photos sent and received for CSAM, as well as on-device matching and tagging of photos before sending to iCloud. This matching would be done via Apple’s NeuralHash algorithm. Inspired by the checksum hash matching for determining software integrity, the model would base the image hash on the presence or absence of certain high-level details in the photo.

The key detail here is the use of on-device networks to do the matching. Instead of collecting data from all devices, storing it on a central oracle, and then running an ML model on the collected data, the NeuralHash model would only be run at the user endpoint and alert Apple if a certain threshold of hits were detected. In theory, this would allow the system to respect end-to-end encryption while still being able to run the model on customer data. Unfortunately, the general public did not take kindly to this approach and saw it as an invasion of privacy. Much can be said about the public relations debacle stemming from Apple scanning private photos while labeling itself as a “privacy first” company, but we’ll focus on the much more important technical errors in the CSAM-scanning system.

Apple’s first mistake was putting so much stake in the integrity of the NeuralHash algorithm. Hashing algorithms used in security contexts typically go through decades-long competitions before being adopted as standards. The exact behavior of a neural network in all possible scenarios is impossible to verify with certainty. In fact, shortly after the release of NeuralHash, users created collision attacks that could add imperceptible modifications to any photo to make the network identify the image as offensive content.

The second mistake was a perceived lack of control of the training data by journalists, developers, and security engineers following the situation. Apple claimed that it was only training the NeuralHash algorithm to match photo features found in law enforcement databases. In countries like the US, state and federal law enforcement agencies maintain databases of confiscated child exploitation, and arresting pedophiles is generally an uncontroversial subject in most of the world. However, Apple products and services are sold in over 52 countries as of 2020. Much of this distribution is dependent on Apple cooperating with governments. What happens if some nation wants to scan for something different? For example, what if an authoritarian government or political faction wants to use NeuralHash to scan for slogans of opposition parties or images of opposition politicians or activists?

This lack of specificity in the NeuralHash algorithm, plus a public lack of confidence that its use would not be restricted to Apple’s narrow stated aims, eventually made Apple delay (though not completely cancel) the release of this feature.

Case 2: GitHub Copilot

In June 2021, and in partnership with OpenAI, GitHub released Copilot, a tool that can autocomplete code based on training on public GitHub repos. Copilot is run by an ML model called Codex, itself based on OpenAI’s GPT-3 (except trained on code instead of raw text). As a consequence, Codex can take a raw-text prompt and convert it to working code in a variety of programming languages. While it can’t completely replace human programmers, Codex is adept at solving the kinds of algorithm problems one could expect in a whiteboard interview at Meta, Apple, Amazon, Netflix, or Alphabet’s Google.

Codex’s generalization ability is impressive, but it carries some of the same issues as GPT-3’s model, which has been shown to be susceptible to memorizing when asked to complete particularly rare or unusual prompts. Codex has the same issue, except some of the information it has memorized is either copyrighted code or accidentally exposed secrets.

This issue with exposed secrets was first reported by a SendGrid engineer who demonstrated that if you asked Copilot for API keys (the same kinds of keys that would grant selective access to mission-critical databases), Copilot would show them. Soon after, people discovered that they could prompt Codex for secrets, like AWS secret keys (e.g., someone could get privileged access to the AWS backends used by entire companies) or cryptocurrency wallet secret keys (e.g., a Bitcoin secret key would allow someone to steal any amount of Bitcoin contained in that wallet, potentially worth millions of dollars).

There are a few approaches to solving this problem. One would be to search for API keys within the training data codebase and censor them. Replacing hashes and passwords with the same X for each character would be easy, though the process of finding every single exposed password, hash, and API key would be much harder and its success could never be guaranteed. Also, legal questions were raised about Copilot’s training data and outputs. Many open source developers were rankled by GitHub’s unauthorized and unlicensed use of copyrighted source code as training data for the model and began moving away from GitHub on these grounds. It’s not always provable whether the outputs are based on proprietary code, but there have been some obvious examples. In one particularly blatant case, Copilot could reproduce Carmack’s famous inverse-square-root function from the game Quake 3. Even a skilled C developer would be unlikely to come up with this solution from scratch, but the copying is made more obvious by the inclusion of someone’s code comments.

This is a much trickier problem to solve; it’s not susceptible to just censoring small numbers of characters. A simple approach would have been to exclude codebases with certain kinds of license files from the training corpus. However, it’s not clear whether including other codebases based on the absence of such files really counts as informed consent. Software IP lawyer Kate Downing argued that while the creation of Copilot might be technically legal, there is still much that needs to be settled in a court of law (not to mention the situation is still morally questionable). This is because GitHub has for years offered licenses like GNU General Public License (GPL) versions 2 and 3. However, they’ve never really advertised that you can choose one license now and another later or that users are given different permissions in the future. Both of these are features of GitHub, and had users been made more aware, they might not have granted GitHub such far-reaching permissions with the users’ code. Given how many open source developers are leaving GitHub because of this usage, it’s likely that many would not have consented to this use of their code.

Case 3: Model and Data Theft from No-Code ML Tools

Plenty of companies have been working on no-code models for training and deploying ML systems. For example, Google’s Teachable Machine and Microsoft’s Lobe.ai offer ways for anyone to train computer vision models. For mobile or frontend developers with very little experience in machine learning, these tools might seem magical—but they’re perfect targets for a type of attack known as a gray-box attack.¹

Consider a project made with Lobe.ai, a tool that allows anyone, regardless of machine learning knowledge, to train a vision model on data from a regular file directory. If you wanted to train your model to determine whether someone is wearing a mask, you could simply take a set of images, cover them up with face masks, and make that the training data. However, a few Lobe.ai users demonstrated that its classifier is running a Resnet150V2 model. If you know the model, you can find out a lot of information about its model architecture, which makes it much easier to steal the model weights (these are the numbers assigned to neurons on a neural network that let it store the all-important patterns, functions, and information it has learned during the computationally intensive training). Such a theft would be dangerous for any organization that has spent many GPU-hours training a model and many human-hours iterating on and building a pipeline around it. After all, why spend all that time and money if it’s easier to steal a competitor’s proprietary model?

This is not to say that no-code tools are not valuable, but only to raise the concerns that come about when someone knows a lot about the machine learning model in question. Countless organizations use out-of-the-box architectures that can be found as part of Keras or PyTorch. With companies selling their ML models as products to be used as API interfaces, some malicious actors may take the opportunity to steal the models themselves.

Definitions

After seeing the preceding examples, you might think you have a pretty good understanding of what privacy is. When it comes to building privacy-preserving systems, definitions matter. In this section, we’ll go through some key terms that you’ll see throughout this book.

Definition of Privacy

Privacy is defined by Merriam-Webster dictionary as “the quality or condition of being secluded from the presence or view of others”, or “the state of being free from public attention or unsanctioned intrusion.” This definition might make it seem like privacy is something that’s either “on or off,” but this is an oversimplification. If we have data that’s not viewable by anyone for any reason (not even the application that would use the data), that’s technically private but functionally useless for most applications. There is a lot of middle ground between data being completely open and completely closed. Because privacy in practical settings falls on a continuum instead of being binary, we need ways of measuring it.

Proxies and Metrics for Privacy

Measuring privacy is a separate matter from defining it. One review, by Isabel Wagner and David Eckhoff,² classified many of the measures out there into categories: adversarial success, indistinguishability, data similarity, accuracy and precision, uncertainty, information gain/loss, and time spent.

Adversarial success

Assume that some kind of hostile party (we’ll refer to them as the adversary) wants to get the contents of whatever data we have or communication we’re sending or receiving. We don’t want them to see or piece together that information. What are the adversary’s chances of succeeding?

This is a very general category of privacy metrics. We don’t know the goals, knowledge, capabilities, or tools at the adversary’s disposal. The adversary could be anyone or anything: a curious user, a corporate spy, a nation-state spy, a lone thief, a company’s preventative penetration tester, or even a DEFCON conference attendee who will just mock you for not securing your data or communication correctly.³ The adversary could be a complete outsider with no knowledge of the technical backend, or they could already know exactly what protocols or techniques you’re using.

Given how vague and open-ended this metric is, there are other definitions that build on this concept of anticipating an attack from an adversary.

Indistinguishability

Indistinguishability refers to how well an adversary can distinguish between two entities in a process or dataset. This definition of privacy is the focus of private ML techniques like differential privacy (see “k-Anonymity”).

Data similarity

Privacy definitions based on data similarity focus on how easily the features and subgroups within the data can be separated (e.g., distinguishing one person’s records from another person’s). This is the focus of ML privacy techniques like k-anonymity (which we discuss in “Differential Privacy”).

Accuracy and precision

Accuracy-based metrics of privacy focus on the accuracy and precision of an adversary’s estimate of the data or communication. This could involve using metrics like F1 scores, precision, or recall to gauge how closely the adversary has estimated the bits of information of your data. If the adversary’s estimate is less accurate, the privacy is greater.

Uncertainty

Metrics of uncertainty assume that greater uncertainty means an adversary has a lesser chance of violating privacy promises. The greater the degree of error or entropy in the adversary’s estimate of the true information, the more private it is. This shares some similarities with accuracy-based metrics, though they should not be confused. Accuracy is the proximity of a reading to its actual value, whereas uncertainty relates to the outliers and anomalies that may skew accuracy readings.

Information gain/loss

Information gain/loss metrics measure how much an adversary can gain or lose from the data. If less information can be gained, then privacy is greater. This metric differs slightly from uncertainty, since it takes into account how much information the attacker had at the start.

Time spent

A motivated adversary may try to violate privacy repeatedly until they succeed. Definitions of privacy based on time assume that privacy mechanisms will inevitably fail, but certain privacy protection mechanisms would require more time-investment to break than others. ML privacy techniques like homomorphic encryption (which we cover in “Homomorphic Encryption”) work with this definition of privacy.

Legal Definitions of Privacy

The aforementioned proxies and metrics for privacy are good ways of assigning a number to how private a system is. While this landscape is useful, we need to know where to draw lines—but part of that decision may already be made for you. If you’re releasing any machine learning–based product, you are by definition going to be dealing with someone’s data. As such, you will invariably run into the boundaries of privacy laws.

k-Anonymity

The concept of k-anonymity, first proposed by Pierangela Samarati and Latanya Sweeney in 1998,⁴ can be thought of as a specific version of “hiding in the crowd.” It relies on increasing the uncertainty that a given record belongs to a certain individual. For this to happen, a dataset needs at least k individuals who share common values for the set of attributes that might identify them. K-anonymity is a powerful tool when used correctly. It’s also one of the precursors to more advanced privacy tools like differential privacy (discussed in “Differential Privacy”).

Types of Privacy-Invading Attacks on ML Pipelines

You should have a good conceptual overview now of what privacy is from a machine learning perspective, why it’s important, and how it can be violated in an ML pipeline. When it comes to violating privacy, outside attackers have a variety of tools at their disposal. The biggest general categories of attacks are membership attacks (identifying the model’s training data), model inversion (using the model to steal proprietary data), and model theft (exactly what it sounds like).

Membership Attacks

One of the privacy risks of machine learning models is that an adversary may be able to reconstruct the data used in the model creation.⁵) The Membership Inference Attack is the process of determining whether a sample comes from the training dataset of a trained ML model or not. For the company whose ML model is being attacked, this could mean an adversary gaining insight into how a proprietary model was constructed, or even where the input data is located (especially risky if it is coming from a poorly secured external server).

There are three main models used in a membership attack:

The target model: This is the model trained on the initial dataset. The model outputs confidence levels for each class, and the one with the highest confidence value is the chosen output class. The membership inference attack is based on the idea that samples from the training dataset would have higher average confidence value in their actual class than samples not seen in training.⁶
The shadow model: In black-box conditions, an attacker cannot do statistical analysis on the confidence levels because they do not have access to the training dataset. The shadow models are an ensemble of models (which may or may not be exact copies of the model’s architecture and hyperparameters) designed to mimic the behavior of the target model. Once the shadow models have been trained, the attacker can generate training samples for the attack models.
The attack model: This is the model that will predict whether a sample is from the training set or not. The inputs for the attack models are the confidence levels, and the output label is either “in” or “out.”

Unless the defender has a very specific unfavorable setup, a membership inference attack is an illusory threat.⁷ This is especially so compared to attacks that are much better at stealing training data, or even the machine learning model itself.

Model Inversion

Most early attacks of this type were too time-consuming for what little information was gained from them. Membership attacks might seem low risk for most ML applications, but there are far more dangerous types. A reconstruction attack takes membership attack principles further by reconstructing the data used in training an ML model. This can be used to directly steal the information of individuals whose data was used in training or reveal enough about how a model interprets data to find ways of breaking it further.

First proposed in 2015,⁸ a model inversion attack is a much more direct way of stealing data. Rather than determining whether an input is part of a dataset, this kind of attack reconstructs very exact representations of actual data. In the original paper, this technique was used on a classifier trained on several faces (see Figure 1-1). Rather than exhausting every possible pixel value that could belong to an individual, this technique used gradient descent to train pixel values to match one of the model’s output classes.

It’s worth noting that the data that is returned by this model inversion attack is an average representation of the data that belongs to the specific class in question. In the presented setting, it does not allow for an inversion of individual training data points. In Fredrikson et al., however, every individual within the face classifier represents their own class. Therefore, the attack can be used in order to retrieve information about individuals and violate their privacy. This is especially the case in applications like facial recognition, where you only need a face that triggers the same keypoint recognition, not one that looks like an actual face.

Model inversion attacks have gotten much more sophisticated since 2015, when this technique was first demonstrated. Even worse, the concepts of model inversion have been extended to steal much more than just the data.

Model Extraction

Model extraction goes several steps further. Instead of just reconstructing the input data for a model, a model extraction attack involves stealing the entire model. This kind of attack was first described in “Stealing Machine Learning Models via Prediction APIs.”⁹ Model theft attacks can range from just stealing the hyperparameters of a model,¹⁰ to outright stealing the model weights.¹¹ Figure 1-2 gives a general overview of the model-stealing attack. The attacker approximates the gradients that would cause the model to output the predictions it’s giving.

Developing high-performing models is expensive. More than just the computational cost (which can be millions of dollars for some models), there’s also the cost of acquiring the massive and likely private dataset. Devising a novel training method is also intellectually taxing. With all this in mind, a malicious actor may decide to just extract the model itself.

The model extraction process typically involves three steps:

Gathering a dataset to query the victim model
Recording predictions from the API on these data points
Training a surrogate model to mimic the victim

There can be enormous variation in this attack pattern. Early model extraction attacks were highly dependent on which dataset was chosen for the queries. The surrogate model could have wildly different accuracy depending on whether CIFAR10, CIFAR100, or MNIST was chosen, for example. More recent attack mechanisms forego this choice of attack altogether by feeding in noise from controlled probability distributions. Different choices of probability distribution can change the number of queries needed in step 2 to approach a satisfactory surrogate. In step 3, an attacker may know nothing about the model architecture (i.e., “black-box” attack), or they may have some details about the architecture (i.e., “gray-box” attack).

The end result is still the same. The surrogate model uses gradient approximation conditioned by how similar the surrogate’s output probabilities are to the victim model’s output probabilities.

If you have access to a computer vision model’s output logits, this information leakage has enormous potential for abuse.¹² These techniques take advantage of the fundamental properties of convolutional neural networks. This means that any kind of pipeline that uses them, not just those that train on images, is at risk. This was seen in the case of graph neural networks in “Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization.”¹³

Much of what makes computer vision models vulnerable is the reuse of common architectures. Common machine learning libraries contain pre-built versions of networks like ResNet and InceptionV3 (see the PyTorch and Keras Model Zoos). Even worse, many of these models can be loaded with ImageNet weights. Fine-tuning a computer vision model gives potential attackers much more information to work with when stealing model weights. An attacker has the starting conditions of the weights and doesn’t need to reconstruct the architecture from scratch. Because of this partial foreknowledge of the neural network, some of these attacks are gray-box attacks.

Stealing a BERT-Based Language Model

This section was inspired by demonstrations of NLP model theft by the CleverHans team and the latest techniques for stealing weights from BERT models.¹⁴^,¹⁵^,¹⁶ In this section, we explore training a text classifier with differential privacy by taking a model pre-trained on public text data and fine-tuning it for a task. The first step in this process is to train the BERT model.

Note

You can find all the code associated with this tutorial in the BERT_attack notebook.

When training a model with differential privacy, one almost always faces a trade-off between model size and accuracy on the task. The fewer parameters the model has, the easier it is to get a good performance with differential privacy.

Most state-of-the-art NLP models are quite deep and large (BERT-base has over 100 million parameters), which makes training text models on private datasets challenging. One to address this problem is to divide the training process into two stages. First, the model is pre-trained on a public dataset, exposing the model to generic text data. Assuming that the generic text data is public, we will not be using differential privacy at this step. Then, most of the layers are frozen, leaving only a few upper layers to be trained on the private dataset using DP-SGD. This approach is the best of both worlds—it produces a deep and powerful text-understanding model, while only training a small number of parameters with a differentially private algorithm.

This tutorial will take the pre-trained BERT-base model and fine-tune it to recognize sentiment classification on the IMDB movie review dataset.¹⁷

!pip -qq install nlp
!pip -qq install transformers
from transformers import (
    BertForSequenceClassification,
    BertTokenizerFast,
    Trainer,
    TrainingArguments,
)
from transformers import glue_compute_metrics as compute_metrics
from nlp import load_dataset
import torch
import numpy as np
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")


def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)


imdb_train_dataset, imdb_test_dataset = load_dataset(
    "imdb", split=["train", "test"]
)
imdb_train_dataset = imdb_train_dataset.map(
    tokenize, batched=True, batch_size=len(imdb_train_dataset)
)
imdb_test_dataset = imdb_test_dataset.map(
    tokenize, batched=True, batch_size=len(imdb_test_dataset)
)
imdb_train_dataset.set_format(
    "torch", columns=["input_ids", "attention_mask", "label"]
)
imdb_test_dataset.set_format(
    "torch", columns=["input_ids", "attention_mask", "label"]
)

BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art approach to various NLP tasks. It uses a transformer architecture and relies heavily on the concept of pre-training. We’ll use a pre-trained BERT-base model, provided in a HuggingFace transformers repository. It gives us a PyTorch implementation for the classic BERT architecture, as well as a tokenizer and weights pre-trained on Wikipedia, a public English corpus.

The model has the following structure. It uses a combination of word, positional, and token embeddings to create a sequence representation, then passes the data through 12 transformer encoders, and finally uses a linear classifier to produce the final label. As the model is already pre-trained and we only plan to fine-tune a few upper layers, we want to freeze all layers, except for the last encoder and above (BertPooler and Classifier). Figure 1-3 shows the BERT model’s architecture.

Thus, by using a pre-trained model, we reduce the number of trainable parameters from over 100 million to just above 7.5 million. This will help both performance and convergence with added noise. Here is the code that trains the model.

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, preds, average="binary"
    )
    acc = accuracy_score(labels, preds)
    return {
        "accuracy": acc,
        "f1": f1,
        "precision": precision,
        "recall": recall,
    }

training_args = TrainingArguments(
   output_dir='./results',
   num_train_epochs=1,
   per_device_train_batch_size=16,
   per_device_eval_batch_size=16,
   warmup_steps=500,
   weight_decay=0.01,
   #evaluate_during_training=True,
   logging_dir='./logs',
)

trainer_vic = Trainer(
   model=model,
   args=training_args,
   compute_metrics=compute_metrics,
   train_dataset=imdb_train_dataset,
   eval_dataset=imdb_test_dataset
)

trainer_vic.train()
trainer_vic.evaluate()

The inference of this system is where our model theft opportunity lies. Let’s try to run inference on the Yelp polarity dataset.

_, origin_sample_test_dataset = load_dataset(
    "yelp_polarity", split=["train", "test"]
)

sample_test_dataset = origin_sample_test_dataset.map(
    tokenize, batched=True, batch_size=len(origin_sample_test_dataset)
)
sample_test_dataset.set_format(
    "torch", columns=["input_ids", "attention_mask", "label"]
)


class ExtractDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {}
        item["attention_mask"] = torch.tensor(
            self.encodings[idx]["attention_mask"]
        )
        item["input_ids"] = torch.tensor(self.encodings[idx]["input_ids"])
        item["label"] = torch.tensor(
            self.labels[idx].argmax(-1), dtype=torch.long
        )
        return item

    def __len__(self):
        return len(self.labels)


theft_train_dataset = ExtractDataset(
    sample_test_dataset, prediction_output.predictions
)
theft_training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    # evaluate_during_training=True,
    logging_dir="./logs",
)

trainer_extract = Trainer(
    model=model,
    args=theft_training_args,
    compute_metrics=compute_metrics_copycat,
    train_dataset=theft_train_dataset,
    # eval_dataset=imdb_test_dataset
)
trainer_extract.train()
trainer_extract.evaluate()

This training scheme will result in a model that produces outputs that behave very similarly to the original model.

Defenses Against Model Theft from Output Logits

If models can be reconstructed using their output logits alone, then this bodes poorly for model security. Fortunately, there are two modes of defending against this kind of inference attack.

The first type of defense is to make it costly to query the model. Xuanli He et al.¹⁸ explored the real-world use of public datasets to steal model weights. Based on the sizes of these datasets and the costs of Google and IBM’s language model APIs (assuming these are the lower bounds of an API call cost), they came up with the cost estimates shown in Table 1-1 for using those datasets to steal a BERT-based language model.

Table 1-1. Attack cost estimates
Dataset	Number of queries	Google price	IBM price
TP-US	22,142	$22.10	$66.30
Yelp	520 K	$520.00	$1,560.00
AG	112 K	$112.00	$336.00
Blog	7,098	$7.10	$21.30

Depending on the cloud provider, the cost of an attack could range from tens to thousands of dollars. The same research demonstrates that you wouldn’t even need to pick a matching transformer architecture to make a closely matching copycat model (e.g., training a DistilBERT model on the outputs of a BERT model is a viable attack option). As such, increasing the cost of a machine learning model API call will go far to protect against this kind of attack. (This was the strategy OpenAI took with GPT-3; thanks to the API call costs, the final price of mounting an inversion attack on the GPT-3 API would probably be more than that of training a GPT-3 model from scratch.)

There’s a second (and more clever) type of defense. Much like how obscuring your face with frosted glass would frustrate facial recognition, one can also add obfuscating noise to the output logits. You can either add in the output noise during the model training,¹⁹ or you can take an ordinary trained model and add random noise to the prediction probabilities afterward.²⁰ This “prediction poisoning” additive noise is the strategy we’ll demonstrate in the next section.

Note

Given how scary model theft is, and how creative attackers can be, this is an area of constant research. There are ways of spotting an attack in progress.²¹^,²² There are also methods for “hardening” your training data samples.²³ You can confuse model theft attacks further simply by using ensembles of models.²⁴

All these proposed ideas and defense strategies can seem daunting if you’re trying to figure out the most important attack to defend against. This is especially the case if the research is very new and you haven’t heard of many successful real-world use cases. Ultimately, it may be worth simulating these attacks on your own system to see how they go.²⁵^,²⁶

This is by no means a comprehensive assortment of attacks one could use to target an ML pipeline. As mentioned earlier, attackers will follow the path of least resistance. This will be made harder for attackers if you can incorporate some kind of privacy-testing tooling into your pipeline.

Privacy-Testing Tools

The Google Cloud Platform (GCP) has a tool for computing the k-anonymity of a given dataset. The exact computation method can be done from the GCP console, a GCP protocol, Java, Node.js, Python, Go, PHP, or C#. Further Python examples of this can be found on the Google python-dlp GitHub. Other Python modules for k-anonymization include:

Nuclearstar/K-Anonymity: Clustering-based k-anonymity implementation
qiyuangong/Clustering_based_K_Anon: Another clustering-based k-anonymity implementation
qiyuangong/Mondrian: Python implementation for Mondrian multidimensional k-anonymity
kedup/python-datafly: Python implementation of Datafly algorithm for k-anonymity on tabular data

Additional privacy-testing tools include:

PrivacyRaven, created by Trail of Bits
TensorFlow Privacy, created by TensorFlow
Machine Learning Privacy Meter, created by NUS Data Privacy and Trustworthy Machine Learning Lab
CypherCat (archive-only), created by IQT Labs/Lab 41
Adversarial Robustness Toolbox (ART), created by IBM
The Machine Learning Privacy Meter, a tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

Methods for Preserving Privacy

Just as there are multiple ways to steal information from an ML model, there are multiple approaches for making that theft hard to the point where it’s impractical.

Differential Privacy

Differential privacy (DP) is a method for sharing insights about a dataset by using high-level patterns of subgroups within the data while masking or omitting data about specific individuals. The main assumption behind DP is that if the effect of making a single change in the data is small enough, then it’s difficult to reliably extract information about the individual from queries.

DP can be thought of as an extension of concepts like k-anonymity. The difference is that differential privacy is often extended to much higher dimensional data. Most modern implementations draw on what’s known as $ϵ$ -differential privacy.

Suppose $ϵ$ is a real number and $𝒜$ is a randomized algorithm that takes in a dataset as an input. $upper D 1$ and $upper D 2$ refer to any two datasets that differ by a change to just one element (e.g., the data of one person). The algorithm $𝒜$ provides $ϵ$ -differential privacy for all possible $upper D 1$ and $upper D 2$ combos, and for all subsets of the possible outputs of $𝒜$ :

upper P left-parenthesis script upper A left-parenthesis upper D 1 right-parenthesis element-of upper S right-parenthesis less-than-or-equal-to exp left-parenthesis epsilon right-parenthesis ModifyingAbove upper P With dot left-parenthesis script upper A left-parenthesis upper D 1 right-parenthesis element-of upper S right-parenthesis

There are a variety of specific techniques for implementing differential privacy. These include additive noise mechanisms like the Laplace mechanism, randomized responses for local differential privacy, and feeding data through some kind of Hamming distance-preserving transformation. This formulation is designed to make sure that privacy is robust in the face of post-processing and that, if faced with highly correlated features, it can at least degrade gracefully and noticeably. Another bonus of differential privacy is its usefulness in defending against certain kinds of model extraction attacks .²⁷

Stealing a Differentially Privately Trained Model

We’ve discussed concepts like differential privacy and resilience to model theft.²⁸ Here we will examine exactly how one would go about stealing model weights in a scenario like this. We can take a pre-trained network (done via differential privacy) and then see how it stands up to various types of attacks. Let’s take the BERT architecture from before and try training it using differential privacy.

Note

You can find all the code associated with this tutorial in the Chapter_1_PyTorch_DP_Demo notebook. Much of this was written shortly before the release of the most recent version of Opacus v1.1.0 and the most recent version of PyTorch v11.0.0. These interactive code tutorials will be adjusted to reflect the most recent versions in the final release. And be warned, they require a lot of RAM.

The main difference in this training, compared to our vanilla implementation, is that we’re using the Opacus library from Meta. This is a library that lets us incorporate differential privacy into PyTorch models. We modify a typical PyTorch DataLoader-based training process by defining and attaching the Opacus Privacy engine into the DataLoader object.

train_loader = DataLoader(
   train_dataset,
   num_workers=WORKERS,
   generator=generator,
   batch_sampler=UniformWithReplacementSampler(
       num_samples=len(train_dataset),
       sample_rate=SAMPLE_RATE,
       generator=generator,
   ),
   collate_fn=padded_collate,
   pin_memory=True,
)

Beyond the usual hyperparameters encountered in model training, DP introduces a privacy cost hyperparameter, which in turn benefits from larger batch sizes since the noise is scaled to the norm of one sample in the batch.

test_loader = torch.utils.data.DataLoader(
   test_dataset,
   batch_size=BATCH_SIZE_TEST,
   shuffle=False,
   num_workers=WORKERS,
   collate_fn=padded_collate,
   pin_memory=True,
)

The trade-off to consider is that this means an increasing batch size relative to the amount of noise epsilon grows at O(sqrt(batch_size). Opacus has a peak memory footprint of O(batch_size^2) compared to a non-differentially private model. Fortunately, Opacus supports a hyperparameter called virtual_batch_size that can separate the gradient computation from the noise addition and parameter updates (at the cost of convergence and privacy guarantee).

if SECURE_RNG:
   try:
       import torchcsprng as prng
   except ImportError as e:
       message = (
           "Need to install the torchcsprng package! "
           "Documentation: https://github.com/pytorch/csprng#installation"
       )
       raise ImportError(message) from e

   generator = prng.create_random_device_generator("/dev/urandom")

else:
   generator = None

Once the engine is built, we can train the model:

# Move the model to appropriate device
model = model.to(device)
# Set the model to train mode (HuggingFace models load in eval mode)
model = model.train()
optimizer = optim.Adam(model.parameters(), lr=LR)

if not DISABLE_DP:
   privacy_engine = PrivacyEngine(
       model,
       sample_rate=SAMPLE_RATE,
       alphas=[1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64)),
       noise_multiplier=SIGMA,
       max_grad_norm=MAX_PER_SAMPLE_GRAD_NORM,
       secure_rng=SECURE_RNG,
   )
   privacy_engine.attach(optimizer)

mean_accuracy = 0
for epoch in range(1, EPOCHS + 1):
   train(model, train_loader, optimizer, epoch)
   mean_accuracy = evaluate(model, test_loader)

if not DISABLE_DP:
   torch.save(mean_accuracy, "bert_imdb_class_dp.pt")
else:
   torch.save(mean_accuracy, "bert_imdb_class_nodp.pt")

For the test accuracy, you’ll notice that the noise comes at a cost. The higher the epsilon, the more protected the input data is, and the less accurate the final model is. What value one chooses for epsilon comes down to how much model accuracy one is willing to sacrifice for the sake of privacy. There are unfortunately no free lunches when it comes to implementing differential privacy.

Further Differential Privacy Tooling

We’ve established many definitions of differential privacy and listed multiple tools. For privacy-preserving AI, the OpenMined project has by far the most extensive ecosystem of implementations for PyTorch-based models.²⁹^,³⁰ While OpenMined has a lot of tools for the PyTorch ecosystem, there are plenty of other PyTorch-based tools such as Opacus (as we discussed).

IBM has its own set of DP tools, which can be found in IBM’s DP library. CleverHans for TensorFlow (and by extension, its Mr. Ed counterpart for PyTorch) has some of the most comprehensive tools for both DP and adversarial hardening. These include PATE, DP-SGD, Moments Accountant, Laplace and Exponential Mechanisms, and other such mechanisms we haven’t discussed here.

Homomorphic Encryption

Encrypting mission-critical data before storage is a standard best practice in any kind of high-stakes engineering. Homomorphic encryption (HE) is the conversion of data into ciphertext that can be analyzed and worked with as if it were still in its original form. The idea behind HE is to extend public-key cryptography by being able to run mathematical operations on the encrypted data without having access to the secret key. The output of the mathematical operation will still be encrypted. This technique has been in development for decades and may refer to one of several variants:

Partially homomorphic encryption: The system can evaluate only one kind of encrypted operation (e.g., addition or multiplication).
Somewhat homomorphic encryption: The system can evaluate two types of operations (e.g., both addition and multiplication) but only for a subset of the system.
Leveled fully homomorphic encryption: The system can evaluate arbitrary computations made up of multiple layers of operations (though there are limits on how deep these operations can be nested).
Fully homomorphic encryption (FHE): This is the strongest (and ideal) form of HE. FHE allows the evaluation of arbitrary algorithms composed of multiple types of operations with no restrictions on the depth of the nesting.

There are two big drawbacks to HE. The first is the need to carefully store the encryption keys responsible for encrypting and decrypting. This has been a problem in many other types of engineering for decades, and as such there is plenty of literature on how best to do this.³¹ The second is that HE brings an enormous computation cost. In the early days, this was on the order of making programs take millions of times longer. More recently it has been reduced to the order of hundreds of times longer. There are many approaches to applying HE to machine learning. These range from encrypting the data, to encrypting the neural network or decision tree, to encrypting some combination of both.

Like many privacy-preserving ML techniques, the OpenMined ecosystem has HE tools. These include a Python interface to TenSEAL, which is Microsoft’s SEAL library for homomorphic encryption.

Secure Multi-Party Computation

If full homomorphic encryption is limited by computational complexity, then the next best thing is secure multi-party computation (SMPC). The idea behind SMPC is having multiple parties compute a function on their inputs, all while keeping those inputs private. Rather than focusing on protection from an outside adversary or the protection of stored data, this privacy approach protects participants’ privacy from each other.

Consider the following workflow: One takes original data, represented by the number 12. Each party involved gets some share of the data (such as 5 or 7), and computes some operation (e.g., “multiply by 3”). When the outputs are combined ( $15 + 21 = 36$ ), the result is identical to the outcome of running the operation on the original data directly. If Party A and Party B are kept from knowing the final output 36, they cannot deduce the original data point 12. This is a super-simplified addition example, but now imagine this is a machine learning pipeline. Our original data is a bunch of user data instead of the number 12. Party A and B get shards or tranches of this data instead of the numbers 5 or 7. The operation they’re running is certainly multiplication, but it’s the large-scale matrix multiplication done when training a ResNet model. The goal behind SMPC is to be able to turn these outputs into a combined decision boundary.

Being able to train models on aggregated data without allowing anyone access to that aggregated data would be extremely valuable, especially if the training data presents a bunch of security, privacy, policy, or legal risks. For example, medical researchers would be able to perform population studies on genetic data without needing to share data between research institutions. Being able to study the gender pay gap across companies would be much more tenable if salary data never actually left the companies in question.

Secure multi-party computation is sometimes used interchangeably with “remote execution” or “trusted execution.” These latter terms do not always describe secure multi-party computation, however. SMPC is a subset of “remote/trusted execution.” Full homomorphic encryption can be implemented within SMPC, but SMPC does not require it.

SMPC Example

For ML systems that make use of the PyTorch ecosystem, one can use Facebook Research’s CrypTen library. The goal of CrypTen is to ensure that the server-to-server interactions required for SMPC can be implemented with minimal friction.

Note

You can see the full code for this tutorial in the accompanying Jupyter Chapter_1_SMPC_Example notebook. This tutorial follows a pre-release version of OpenMined, and is based on code by Ayoub Benaissa (a prominent OpenMined contributor). The details will be finalized prior to publication, but until then this should not be used to secure important data. The code tutorial will be updated accordingly to demonstrate the best practices for the most up-to-date version of OpenMined until its release.

CrypTen was created with an “honest but curious” intruder in mind. Initially, it was built with internal participants in mind, not for protection against outside attackers. The OpenMined SMPC project extends CrypTen further, answering some of the unanswered questions in the original CrypTen announcement. Nothing changes about how CrypTen parties synchronize and exchange information. However, PySyft can be used to initiate the computation among workers, as well as exchange the final results between workers.³²

import torch
import torch.nn as nn
import torch.nn.functional as F
import crypten
import syft
from time import time

torch.manual_seed(0)
torch.set_num_threads(1)
hook = syft.TorchHook(torch)

from syft.frameworks.crypten.context import run_multiworkers
from syft.grid.clients.data_centric_fl_client import DataCentricFLClient

For this deep dive, you’ll need to install both PySyft and CrypTen. You should also install MNIST using the MNIST_utils from Crypten. In addition, start two GridNodes with IDs 'ALICE' and 'BOB' listening to ports '3000' and '3001', respectively. You can do this by initializing GridNode in two separate terminals.

!pip -qq install torch==1.8.0
!pip -qq install syft==0.2.9
!pip -qq install crypten

For this tutorial, we can define a simple neural network in standard PyTorch.

# Define an example network
class ExampleNet(nn.Module):
   def __init__(self):
       super(ExampleNet, self).__init__()
       self.conv1 = nn.Conv2d(1, 16, kernel_size=5, padding=0)
       self.fc1 = nn.Linear(16 * 12 * 12, 100)
       self.fc2 = nn.Linear(100, 2)

   def forward(self, x):
       out = self.conv1(x)
       out = F.relu(out)
       out = F.max_pool2d(out, 2)
       out = out.view(-1, 16 * 12 * 12)
       out = self.fc1(out)
       out = F.relu(out)
       out = self.fc2(out)
       return out

You can now connect to ALICE and BOB via their respective ports, followed by preparing and sending the data to the different workers (this is just for demonstration; in a real-life implementation, data would already be stored privately). If you’re using different ports or running workers in a remote machine, you should update the URLs.

# Syft workers
print("[%] Connecting to workers ...")
ALICE = DataCentricFLClient(hook, "ws://localhost:3000")
BOB = DataCentricFLClient(hook, "ws://localhost:3001")
print("[+] Connected to workers")

print("[%] Sending labels and training data ...")
# Prepare and send labels
label_eye = torch.eye(2)
labels = torch.load("/tmp/train_labels.pth")
labels = labels.long()
labels_one_hot = label_eye[labels]
labels_one_hot.tag("labels")
al_ptr = labels_one_hot.send(ALICE)
bl_ptr = labels_one_hot.send(BOB)

# Prepare and send training data
alice_train = torch.load("/tmp/alice_train.pth").tag("alice_train")
at_ptr = alice_train.send(ALICE)
bob_train = torch.load("/tmp/bob_train.pth").tag("bob_train")
bt_ptr = bob_train.send(BOB)

print("[+] Data ready")

With the workers set up, instantiate your model and create a placeholder input for building the entire CrypTen model.

# Initialize model
placeholder_input = torch.empty(1, 1, 28, 28)
pytorch_model = ExampleNet()

Defining the CrypTen computation for training the neural network is relatively straightforward. You only need to decorate your training loop function with the @run_multiworkers decorator to run it across the different workers.

@run_multiworkers(
    [ALICE, BOB],
    master_addr="127.0.0.1",
    model=pytorch_model,
    placeholder_input=placeholder_input,
)
def run_encrypted_training():
    rank = crypten.communicator.get().get_rank()
    # Load the labels
    worker = syft.frameworks.crypten.get_worker_from_rank(rank)
    labels_one_hot = worker.search("labels")[0]
    # Load data:
    x_alice_enc = crypten.load("alice_train", 0)
    x_bob_enc = crypten.load("bob_train", 1)
    # Combine the feature sets: identical to Tutorial 3
    x_combined_enc = crypten.cat([x_alice_enc, x_bob_enc], dim=2)
    # Reshape to match the network architecture
    x_combined_enc = x_combined_enc.unsqueeze(1)
    # model is sent from the master worker
    model.encrypt()
    # Set train mode
    model.train()
    # Define a loss function
    loss = crypten.nn.MSELoss()
    # Define training parameters
    learning_rate = 0.001
    num_epochs = 2
    batch_size = 10
    num_batches = x_combined_enc.size(0) // batch_size

    for i in range(num_epochs):
        # Print once for readability
        if rank == 0:
            print(f"Epoch {i} in progress:")
            pass
        for batch in range(num_batches):
            # define the start and end of the training mini-batch
            start, end = batch * batch_size, (batch + 1) * batch_size
            # construct AutogradCrypTensors out of training examples / labels
            x_train = x_combined_enc[start:end]
            y_batch = labels_one_hot[start:end]
            y_train = crypten.cryptensor(y_batch, requires_grad=True)
            # perform forward pass:
            output = model(x_train)
            loss_value = loss(output, y_train)
            # set gradients to "zero"
            model.zero_grad()
            # perform backward pass:
            loss_value.backward()
            # update parameters
            model.update_parameters(learning_rate)
            # Print progress every batch:
            batch_loss = loss_value.get_plain_text()
            if rank == 0:
                print(
                    f"\tBatch {(batch + 1)} of \
                {num_batches} Loss {batch_loss.item():.4f}"
                )
    model.decrypt()
    # printed contain all the printed strings during training
    return printed, model

You can now complete the distributed computation. This produces a dictionary containing the result from every worker, indexed by the rank of the party it was running. For instance, result[0] contains the result of party 0 that was running in 'alice', and result[0][i] contains the ith value, depending on how many values were returned.

print("[%] Starting computation")
func_ts = time()
result = run_encrypted_training()
func_te = time()
print(f"[+] run_encrypted_training() took {int(func_te - func_ts)}s")
printed = result[0][0]
model = result[0][1]
print(printed)

The model output is a CrypTen model, but you can use PySyft to share the parameters as long as the model is not encrypted.

cp = syft.VirtualWorker(hook=hook, id="cp")
model.fix_prec()
model.share(ALICE, BOB, crypto_provider=cp)
print(model)
print(list(model.parameters())[0])

Further SMPC Tooling

OpenMined has also been working on many non-ML applications of SMPC. For example, it has a demo project for using private set intersection to alert individuals that they’ve been exposed to COVID-19.

Federated Learning

Federated learning (FL) is a subset of secure multi-party computation.³³ It can also be combined with other privacy-preserving ML techniques like differential privacy and HE. FL specifically refers to sending copies of a trainable model to wherever the data is located, training on this data at the source, and then recalling the training updates into one global model. At no point is the data itself aggregated into one database. Only the models, model updates, or pieces of the model are transferred.

Google used FL to improve text autocompletion in Android’s keyboard without exposing users’ text or uploading it to a cloud intermediary.³⁴ Since 2019, Apple has been using FL to improve Siri’s voice recognition.³⁵ As time goes on, more complex models have become trainable. Thanks to advances in offline reinforcement learning, it is also possible to do FL with reinforcement learning agents. FL is extremely attractive for any context where aggregating data is a liability, especially healthcare.

FL can theoretically be implemented within CrypTen,³⁶ but OpenMined has additional support for implementing federated learning in PyTorch.³⁷ The TensorFlow ecosystem supports FL through TensorFlow Federated.

Warning

Technologies like differential privacy, FL, and SMPC are useful in general for stopping data leakage and securing ML models. However, this should not be confused with compliance with data privacy laws (some of which have specific lists of requirements, lists that do not mention any of these technologies yet). These technologies can help with compliance in some cases, but they do not grant automatic compliance, nor are they ever the only best security practice to use. For example, using FL in your ML pipeline is a good practice, but it will not automatically make you HIPAA compliant in the US.

Conclusion

You’ve learned that techniques like homomorphic encryption, federated learning, differential privacy, and secure multi-party computation are all different parts of the ML privacy stack (which itself is just one part of the cybersecurity space). These techniques encompass different areas in which data can leak, from data inputs to model parameters to decision outputs.

Several groups have begun combining these techniques. A recent collaboration between MIT, the Swiss Laboratory for Data Security, and several hospitals in Lausanne, Switzerland, demonstrated a real-world application of combining federated learning, differential privacy, homomorphic encryption, and multi-party computation into a combined analytics system (designated FAHME), shown in Figure 1-4.³⁸

The collaborators used the FAHME system to conduct research in oncology and genetics. The purpose was to demonstrate that multiple institutions could collaborate without any one of them having access to the full data, without introducing any errors into the results. The final results were identical to those resulting from using the pooled dataset. The authors also showed that this is much easier and more accurate than using a meta-analysis, which involves working with summary statistics of datasets in the absence of the original data.

The problem with a meta-analysis is getting around Simpson’s paradox. This is a problem where trends that appear in several groups of data disappear or reverse completely when the groups are combined. Fixing Simpson’s paradox in meta-analysis is a difficult problem,³⁹ but FAHME offers a promising solution: skip the meta-analysis stage entirely and work directly with the pooled data in encrypted form. In a FAHME workflow, a querier submits a differentially private query to the FAHME system, which uses HE in the computation of the results. The resulting analytics are combined with multi-party computation.

This was a great real-world demonstration of the concepts discussed in this chapter. However, there’s much more to robust and trustworthy machine learning pipelines than just privacy.

¹ This term is derived from “black box” and “white box” attacks. While some people are avoiding these terms out of sensitivity for the unconscious bias they can introduce around Blackness and Whiteness, we were unable to find a wholly suitable alternative for this book and we still recommend outside resources that use this terminology. We hope that calling your attention to the potential for bias will prevent the perpetuation of it.

² Isabel Wagner and David Eckhoff, “Technical Privacy Metrics: A Systematic Survey”, ACM Computing Surveys (CSUR) 51, no. 3 (2018): 1–38.

³ See examples in this report on the Hacker Wall of Shame. You may also have heard preventative penetration testers called “white-hat” hackers, a name that comes from the white hats archetypically worn by protagonists in Western films.

⁴ Pierangela Samarati and Latanya Sweeney, “Protecting Privacy When Disclosing Information: K-Anonymity and Its Enforcement Through Generalization and Suppression”, 1998.

⁵ Membership inference attacks were first described in Reza Shokri et al., “Membership Inference Attacks Against Machine Learning Models”, 2017 IEEE symposium on security and privacy (SP), (2017): 3–18.

⁶ Shokri et al., “Membership Inference Attacks Against Machine Learning Models,” 3–18.

⁷ For more on why membership inference attacks are particularly high-risk, low-reward, see Paul Irolla, “Demystifying the Membership Inference Attack”, Disaitek, September 19, 2019.

⁸ Matt Fredrikson et al., “Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures”, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (2015): 1322–33.

⁹ Florian Tramèr et al., “Stealing Machine Learning Models via Prediction APIs”, 25th USENIX Security Symposium (USENIX Security 16) (2016): 601–18.

¹⁰ Binghui Wang and Neil Z. Gong, “Stealing Hyperparameters in Machine Learning”, 2018 IEEE Symposium on Security and Privacy (SP) (2018): 36–52.

¹¹ Antonio Barbalau et al., “Black-Box Ripper: Copying Black-Box Models Using Generative Evolutionary Algorithms”, Advances in Neural Information Processing Systems 33 (2020). For the full code, visit GitHub.

¹² J. R. Correia-Silva et al., “Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data”, 2018 International Joint Conference on Neural Networks (IJCNN), (2018): 1–8.

¹³ Bang Wu et al., “Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization”, Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security (2022): 337-50.

¹⁴ Kalpesh Khrisha and Nicolas Papernot, “How to Steal Modern NLP Systems with Gibberish”, cleverhans-blog, vol. 28, 2020.

¹⁵ See the CleverHans team’s code example.

¹⁶ Xuanli He et al., “Model Extraction and Adversarial Transferability, Your BERT Is Vulnerable!”, CoRR, vol. abs/2103.10013 (2021); extraction and transfer code available on GitHub.

¹⁷ Samuel R. Bowman et al., “A Large Annotated Corpus for Learning Natural Language Inference”, arXiv preprint (2015). The project page includes papers that use this along with download links.

¹⁸ Xuanli He et al., “Model Extraction and Adversarial Transferability, Your BERT Is Vulnerable!”, arXiv preprint (2021).

¹⁹ Yuto Mori et al., “BODAME: Bilevel Optimization for Defense Against Model Extraction”, arXiv preprint (2021).

²⁰ Tribhuvanesh Orekondy et al., “Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks”, arXiv preprint (2019). Code example available on GitHub.

²¹ Soham Pal et al., “Stateful Detection of Model Extraction Attacks”, arXiv preprint (2021).

²² Zhanyuan Zhang et al., “Towards Characterizing Model Extraction Queries and How to Detect Them” Research Project, University of California, Berkeley, 2021.

²³ Amir Mahdi Sadeghzadeh et al., “Hardness of Samples Is All You Need: Protecting Deep Learning Models Using Hardness of Samples”, arXiv preprint (2021).

²⁴ Sanjay Kariyappa et al., “Protecting DNNs From Theft Using an Ensemble of Diverse Models” (2020).

²⁵ Mika Juuti et al., “PRADA: Protecting Against DNN Model Stealing Attacks”, 2019 IEEE European Symposium on Security and Privacy (EuroS\&P), (2019): 512–27.

²⁶ Chen Ma et al., “Simulating Unknown Target Models for Query-Efficient Black-Box Attacks” arXiv preprint (2020). The code is available on GitHub.

²⁷ Huadi Zheng et al. “Protecting Decision Boundary of Machine Learning Model with Differentially Private Perturbation”, IEEE Transactions on Dependable and Secure Computing (2020): 2007-22.

²⁸ For an example, see Google’s differential privacy GitHub repo.

²⁹ See Lex Fridman’s slides on the project.

³⁰ Adam James Hall et al., “Syft 0.5: A Platform for Universally Deployable Structured Transparency”, arXiv preprint (2021).

³¹ Aaron Rinehart and Kelly Shortridge, “Security Chaos Engineering”, (O’Reilly, 2020).

³² More of the best practices and philosophies of the PySyft Library are detailed in Alexander Ziller et al., “Pysyft: A Library for Easy Federated Learning,” in Federated Learning Systems, edited by Muhammad Habib ur Rehman and Mohamed Medhat Gaber, 111–39. New York: Springer, 2021.

³³ If you want to get into the exact taxonomy, see Huafei Zhu et al., “On the Relationship Between (Secure) Multi-Party Computation and (Secure) Federated Learning” DeepAI.org, 2020.

³⁴ Brendan McMahan and Daniel Ramage, “Federated Learning: Collaborative Machine Learning Without Centralized Training Data”, Google Research (blog), April 6, 2017.

³⁵ Karen Hao, “How Apple Personalizes Siri Without Hoovering up Your Data”, MIT Technology Review, December 11, 2019.

³⁶ David Gunning et al., “CrypTen: A New Research Tool for Secure Machine Learning with PyTorch”, MetaAI, October 10, 2019.

³⁷ OpenMined has a blog on federated learning.

³⁸ David Froelicher et al., “Truly Privacy-Preserving Federated Analytics for Precision Medicine with Multiparty Homomorphic Encryption”, Nature Communications 12, no. 1 (2021): 1–10.

³⁹ For example, see Gerta Rücker and Martin Schumacher, “Simpson’s Paradox Visualized: The Example of the Rosiglitazone Meta-Analysis”, BMC Medical Research Methodology 8, no. 34 (2008).

Get Practicing Trustworthy Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Chapter 1. Privacy

Attack Vectors for Machine Learning Pipelines

Improperly Implemented Privacy Features in ML: Case Studies

Case 1: Apple’s CSAM

Case 2: GitHub Copilot

Case 3: Model and Data Theft from No-Code ML Tools

Definitions

Definition of Privacy

Proxies and Metrics for Privacy

Adversarial success

Indistinguishability

Data similarity

Accuracy and precision

Uncertainty

Information gain/loss

Time spent

Legal Definitions of Privacy

k-Anonymity

Types of Privacy-Invading Attacks on ML Pipelines

Membership Attacks

Model Inversion

Figure 1-1. Original face image (right) and restored one through model inversion (left) (from Fredrikson et al.)

Model Extraction

Figure 1-2. General overview of the structure of a model-stealing attack

Stealing a BERT-Based Language Model

Note

Figure 1-3. BERT architecture

Defenses Against Model Theft from Output Logits

Note

Privacy-Testing Tools

Methods for Preserving Privacy

Differential Privacy

Stealing a Differentially Privately Trained Model

Note

Further Differential Privacy Tooling

Homomorphic Encryption

Secure Multi-Party Computation

SMPC Example

Note

Further SMPC Tooling

Federated Learning

Warning

Conclusion

Figure 1-4. System model and FAMHE workflow (credit: based on a figure from Froelicher et al.)

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly