Chapter 6. Methods for Generating Adversarial Perturbation

Chapter 5 considered the principles of adversarial input, but how are adversarial examples generated in practice? This chapter presents techniques for generating adversarial images and provides some code for you to experiment with. In Chapter 7 we’ll then explore how such methods might be incorporated into a real-world attack where the DNN is part of a broader processing chain and the adversary has additional challenges, such as remaining covert.

Open Projects and Code

There are several initiatives to bring the exploration of adversarial attacks and defenses into the public domain, such as CleverHans, Foolbox, and IBM’s Adversarial Robustness Toolbox. These projects are detailed further in Chapter 10.

For consistency, all the code in this book uses the Foolbox libraries.

Before considering the methods for creating adversarial input, you might wonder—how difficult is it to create an adversarial example simply by trial and error? You might, for example, add some random perturbation to an image and see the effect it has on the model’s predictions. Unfortunately for an adversary, it isn’t quite so simple. During its learning phase, the DNN will have generalized from the training data, so it is likely to have resilience to small random perturbations; such changes are therefore unlikely to be successful. Figure 6-1 illustrates that even when every pixel color value has been incrementally perturbed by a significant random amount, ...

Get Strengthening Deep Neural Networks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.