Chapter 1. Introduction
This book is concerned with deep neural networks (DNNs), the deep learning algorithms that underpin many aspects of artificial intelligence (AI). AI covers the broad discipline of creating intelligent machines that mimic human intelligence capabilities such as the processing and interpretation of images, audio, and language; learning from and interacting with unpredictable physical and digital environments; and reasoning about abstract ideas and concepts. While AI also exploits other methods such as the broader field of machine learning (ML) and traditionally programmed algorithms, the ability of deep learning to imitate human capabilities places DNNs central to this discipline. DNNs can mimic, and often exceed, human capability in many tasks, such as image processing, speech recognition, and text comprehension. However, this book is not about how accurate or fast DNNs are; it’s about how they can be fooled and what can be done to strengthen them against such trickery.
This introduction will begin with a brief explanation of DNNs, including some history and when it first became apparent that they might not always return the answer that we expect. This introductory chapter then goes on to explain what comprises adversarial input and its potential implications in a society where AI is becoming increasingly prevalent.
A Shallow Introduction to Deep Learning
A DNN is a type of machine learning algorithm. In contrast to traditional software programs, these algorithms do not expose the rules that govern their behavior in explicitly programmed steps, but learn their behavior from example (training) data. The learned algorithm is often referred to as a model because it provides a model of the characteristics of the training data used to generate it.
DNNs are a subset of a broader set of algorithms termed artificial neural networks (ANNs). The ideas behind ANNs date back to the 1940s and 1950s, when researchers first speculated that human intelligence and learning could be artificially simulated through algorithms (loosely) based on neuroscience. Because of this background, ANNs are sometimes explained at a high level in terms of neurobiological constructs, such as neurons and the axons and synapses that connect them.
The architecture (or structure) of an ANN is typically layered, ingesting data into a first layer of artificial “neurons” that cause connecting artificial “synapses” to fire and trigger the next layer, and so on until the final neuron layer produces a result. Figure 1-1 is an extreme simplification of the highly advanced artificial neural processing performed by Deep Thought in The Hitchhiker’s Guide to the Galaxy, by Douglas Adams (1979). It takes in data and returns the meaning of life.1
A DNN learns its behavior—essentially the circumstances under and extent to which the synapses and neurons should fire—by examples. Examples are presented in the form of training data, and the network’s behavior is adjusted until it behaves in the way that is required. The training step to create a DNN is classified as “deep” learning because, in contrast to the simple ANNs, DNN models comprise multiple layers of neurons between the layer that receives the input and the layer that produces output. They are used when the data or problem is too complex for simple ANNs or more traditional ML approaches.
Like any other ML algorithm, a DNN model simply represents a mathematical function. This is a really important point to grasp. Depicting it in terms of connected neurons makes the concepts easier to understand, but you won’t see references to neurons or synapses in the software implementing a neural network.
The mathematics underpinning DNNs is particularly powerful, enabling a model to approximate any mathematical function. So, given enough data and compute power, a trained DNN can learn how to map any set of (complex) input data to a required output. This makes deep learning particularly effective in understanding data that is unstructured or where the key features are difficult to discern. DNN models have proved effective, for example, with image processing, translating between languages, understanding speech, forecasting weather, or predicting financial market trends. Perhaps even more remarkably, DNNs can also be trained to generate data (such as realistic images or text) in a way that appears to mimic human creativity. Advances in DNNs have opened astonishing opportunities for complex computational tasks, and these networks are becoming prevalent in many areas of our society.
A Very Brief History of Deep Learning
At the start of this century, deep learning and neural networks were a niche field restricted to specialist researchers. DNNs were primarily theoretical and difficult to implement. The key problem in the practical realization of DNN technologies is that training a DNN model (essentially teaching the algorithm so that it works correctly) requires vast amounts of training data and a computationally expensive training process. In addition, the training data often needs to be labeled; that is, the correct answer for each training example must be available and associated with it. For example, every image in an image training dataset would require some associated data to say what it contains, and possibly where that content is located within the image.
Training a DNN to perform a complex task, such as vision recognition, typically requires tens of thousands or even millions of training examples, each of which is correctly labeled. Machine learning enthusiasts realized early on that assembling a sufficiently large amount of labeled data would be a mammoth undertaking. At the turn of the century, however, the growth of the internet suddenly made acquiring this training data possible. Internet giants such as Google and Facebook began to exploit the vast oceans of data available to them to train models for a whole raft of business uses, such as language translation. Meanwhile, researchers initiated crowdsourced projects to label training datasets by hand. A groundbreaking example is ImageNet (see ImageNet), a project that was a core enabler to the development of DNN technology for computer vision.
ImageNet
ImageNet is a database of links to images created for advancing the field of machine vision. It contains links to over 14 million images, each assigned to one or more categories depending on image content. The database is organized in a hierarchical structure to enable varying levels of generalization (for example, “dog” or, more specifically, the type of dog, such as “labrador”). The ImageNet project exploited crowdsourcing to hand-annotate each image with the appropriate labels.
Since 2010, this project has run its annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) to advance research in the domain of visual object recognition software.
Hardware technology was advancing too. In particular, graphics processing units (GPUs) that had been developed for computer graphics and image processing (particularly for gaming) enabled complex matrix processing at speed—something required for training DNNs. From around 2010, the development of DNNs was possible. Soon they were reaching accuracies and speeds in areas of AI such as visual comprehension and speech translation on a par with, or even surpassing, human capability.
AI “Optical Illusions”: A Surprising Revelation
While the accuracies and progress of DNNs were being celebrated, in 2013, researchers Szegedy et al. published a paper, “Intriguing Properties of Neural Networks,”2 that was presented at the International Conference on Learning Representations (ICLR) the following year. This paper exposed the fact that deep learning algorithms could be “fooled” into emitting incorrect results.
The particular algorithms under scrutiny were DNNs for image classification. These take an image as input and classify it in terms of its most likely prevalent content—for example, the image might be classified as a “table” if a table were the primary item in the picture. While these neural networks were widely regarded as state of the art in terms of image classification, they made surprising mistakes when presented with images that had been intentionally sprinkled with small pixel changes that were imperceptible to humans. To a person, the image looked unchanged, but these minor modifications caused the neural networks to make significant errors in classification.
Figure 1-2 shows three examples of misclassified images that were presented in the paper. In the lefthand column are the originals that were correctly classified by the DNN algorithm. The images in the center column depict the “adversarial perturbations” created specifically for the original images on the left. These perturbations were then reduced by multiplying each pixel change by a fraction. When the reduced (less visible) perturbation was added to the original image, the image on the right was generated. All the images in the righthand column were misclassified by the same algorithm as an ostrich (specifically “ostrich, Struthio camelus”), despite appearing to human eyes to be the same as the originals.
This was intriguing, not only to those working in AI, but also to those with no detailed background in intelligent machines. The fact that DNNs could potentially be so easily fooled captured the interest of the popular press. In some articles the concept was described as “optical illusions” for AI.
Perhaps we assumed that because neural networks were inspired by neuroscience and appeared to mimic aspects of human intelligence so effectively, they “thought” like humans. As deep neural network concepts were initially inspired by a simplification of synapses and neurons within the brain, it would not be unreasonable to assume that DNNs interpreted images in a way that was similar to the brain’s visual cortex—but this is not the case. They clearly do not extract the abstract features that humans use to classify images, but use different rules altogether. From the perspective of those working on neural network technologies, understanding how these algorithms can be fooled has provided insight into the algorithms themselves.
Since the initial Szegedy et al. paper, this vulnerability to trickery has been proven for other modalities such as speech and text, indicating that it is not restricted to DNNs that process image data, but is a phenomenon applicable to DNN technologies more broadly. In a world becoming increasingly reliant on DNNs, this was big news.
What Is “Adversarial Input”?
In the domain of image processing, the concept of adversarial input has been likened to creating optical illusions to which only AI is susceptible. An adversarial image might be generated by sprinkling seemingly unimportant pixels across an image of a cat that causes the AI to classify the image as a dog, without introducing any noticeable features that a person would discern as dog-like. Adversarial input could also be some marks on a road sign that we would interpret as graffiti, but which could cause an autonomous vehicle to misinterpret the sign. Examples also extend to audio, such as the inclusion of inaudible adversarial commands within speech that fundamentally change its interpretation by an automatic speech recognition system. All these scenarios are underpinned by DNN models.
The term adversarial example was first used by Szegedy et al. to describe examples such as those illustrated in Figure 1-2. This term can be defined as an input created with the intent to cause a model to return an incorrect result, regardless of whether the input actually succeeds in fooling the network or not. More commonly, the term’s usage is restricted to inputs that achieve the aim of confusing a network. In this book, the terms adversarial input and adversarial example are used interchangeably to mean input that successfully fools a network into producing predictions that humans would consider incorrect. In the context of this book, therefore, nonadversarial input is data that fails to fool the network, even if it was developed with adversarial intent.
Malware as Adversarial Input
There is also increasing interest in the application of neural networks to malware detection, as the complexity inherent in software and the ever-evolving nature of malware make it impossible to articulate the features in software that might indicate a threat.
The term adversarial input is sometimes used to define malware when the anti-malware software is implemented by machine learning. This is a logical definition since malware is input to cause the machine learned model to return an incorrect result of “benign.”
This book focuses on DNN models that process digital renderings of visual and auditory information; data that our biological brains process so easily. Image and audio data is continuous, comprising pixels or audio frequencies with a continuous spread of values. By contrast, other complex data, such as text, is discrete and not composed of quantifiable values. In the discrete domain it may be more challenging to create an adversarial example that will remain undetected because it is difficult to quantify a “small” change. For example, a small change to a word in some text may be overlooked as a misspelling, or be obvious if it results in a completely different meaning.
In the case of AI systems that are designed to process image or audio, an “incorrect” result does not necessarily mean that it differs from what a human might perceive. It is possible that an adversarial example will fool biological (human) intelligence too. This raises the question: do we want AI to interpret the world in the same way as we do? In the majority of cases, we won’t want it to mimic human thinking to the extent that it also includes the failings of human perception. Most of the adversarial examples discussed in this book will be ones that would not fool our human brains—our biological neural networks. These examples introduce interesting threat models and also emphasize the difference between artificial and human intelligence.
Although any ML algorithm is potentially at risk of adversarial input, DNNs may have greater susceptibility as they excel at tasks where it is difficult to establish what features in the data are worth learning—therefore, we may have little or no understanding of what aspects of the data are important to a DNN algorithm. If we don’t understand the aspects of the data that the algorithm uses in its decision making, what hope do we have of establishing good tests to assure the algorithm’s robustness? Adversarial inputs exploit the fact that deep learning models typically deal with millions of possible input variants based on only a very small proportion of learned examples.3 The learned models must be flexible enough to deal with complexity, but also generalize sufficiently for previously unseen data. As a result, the DNN’s behavior for most possible inputs remains untested and can often be unexpected.
The attack presented by Szegedy et al. is a perturbation attack. However, there are other methods of fooling DNNs. The following sections introduce some of the different categories of approaches and the key terminology used in the area of adversarial input.
Adversarial Perturbation
The examples presented in Figure 1-2 illustrate adversarial images generated by making carefully calculated changes to the original images, altering each pixel by a tiny amount. This is known as a perturbation attack. An alternative approach might be to alter a few carefully selected pixels more significantly. The number of pixels changed and the change per pixel might vary, but the overall effect remains sufficiently small as to not be noticeable to (or to be overlooked by) a human. The perturbation might appear random, but it’s not; each pixel has been carefully tweaked to produce the required result. Later in this book we’ll look at how such perturbations are calculated to produce adversarial input.
Adversarial perturbation is not unique to images. Similar techniques could be applied to audio, for example (see Figure 1-3). The principles here remain the same—small changes in the audio to confuse the DNN processing. However, whereas an adversarial image exploits the spatial dimension to introduce perturbation (to pixels), adversarial audio exploits perturbations to audio frequencies that are distributed across time. For example, subtle changes in voice frequency over the duration of a speech segment that are not noticeable to a human can cause speech-to-text models to misinterpret a spoken sentence.4
Unnatural Adversarial Input
In 2015, Nguyen et al. published a paper titled “Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images.”5 Their research demonstrated that when the realism of the actual content is not important, adversarial images can be generated to produce confident misclassifications from DNNs despite the fact that they do not resemble anything that would be seen in the natural world. Some of the examples presented in this paper are shown in Figure 1-4.
These images are a real indication of the fact that DNNs can learn to interpret image data based on features that we as humans would not use. Clearly, these images are not going to fool anyone; however, they should not be dismissed. Examples such as these could be used by an adversary to force a system into making false-positive conclusions where the images cause a denial of service by flooding a system with data.
Adversarial Patches
Rather than distributing change across the input to create the adversarial example, an alternative approach is to focus on one area and essentially “distract” the DNN from aspects of the data that it should be focusing on.
Adversarial patches are carefully created “stickers” that are added to the data. These patches have the effect of distracting the DNN from the relevant aspects of the input and cause it to produce the wrong answer. An example of a digitally generated adversarial patch generated by Google researchers is shown in Figure 1-5. The sticker has been mathematically optimized to ensure that, from the DNN’s perspective, it is a more salient feature than an object that exists in the real world and therefore ensures a confident misclassification.
The adversarial change is obvious to a human observer, but this might not matter, especially if it is sufficiently subtle so as to not affect the interpretation of that image by a human. For example, it might be placed at the outskirts of the image and potentially disguised as a logo. We would also not be fooled into believing that the scene contained a toaster rather than a banana because the patch has been specifically designed to be salient to the DNN, rather than salient to a human.
Let’s consider the same principle for audio. An audio patch might equate to a sound clip, short enough or perhaps quiet enough to be ignored by a human listener. In the same way that the adversarial image patch must be optimally sized and located spatially, the audio patch requires appropriate temporal location and intensity. The principle of adversarial patches for image and audio is illustrated in Figure 1-6.
An interesting feature of adversarial patches is that they may be more easily reused than adversarial perturbations. For example, a digitally generated adversarial sticker could be effective over multiple images, allowing it to be shared online or copied and stuck in multiple environments.
Adversarial Examples in the Physical World
The adversarial example illustrated in Figure 1-2 was generated by digital manipulation; in this case by changing pixel-level information across the images. However, this assumes the attacker has access to the digital format of the data being passed to the model—for example, if the adversary uploaded a digital image (such as a JPEG) to an internet site where it would then be processed.
In many settings, the adversary may only have access to the physical world7 in order to influence the information entering the sensors (microphone or camera, for example) from which the digital data is generated. An adversary might exploit digitally generated adversarial patches in the form of 2D printouts or even 3D objects within a scene. Sharif et al. successfully demonstrate this idea in “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition”8 by using adversarial glasses that enable the wearers to confuse face detection by facial recognition software. Figure 1-7 shows an example of such glasses.
Perturbation attacks are obviously far more difficult to achieve when the adversary has no ability to affect the digital representation of the data. An often-cited scenario of a physical-world adversarial attack is an environmental change made to confuse an autonomous vehicle in which decisions as to steering, response, speed, and so on are based on processing image data captured on camera. The vehicle’s behavior might therefore be susceptible to changes to road markings, patterns on other vehicles, or road signs. Eykholt et al. have proven it possible to generate adversarial attacks in the real world based on the principles of perturbation.9
Figure 1-8 shows a perturbation attack using a simple traffic stop sign. The attack causes the DNN to misinterpret the sign and therefore has the potential to fool an autonomous vehicle.
An interesting aspect of these physical-world examples is that, unlike the digital perturbation attacks previously described, they are often clearly detectable by human beings. The aim of the adversary is often to make the environmental change something that we would not notice as unusual. For example, in Figure 1-8, we can see the perturbation on the stop sign, but we might not recognize it as suspicious; it’s designed to mimic graffiti and therefore appear benign to an onlooker.
It’s also feasible to generate adversarial sound using broadly similar approaches. Adversarial speech can be created and disguised in other speech, sounds, or even in silence, presenting a threat to voice controlled systems (such as voice controlled digital assistants).10
The Broader Field of “Adversarial Machine Learning”
This book is about adversarial examples for image and audio neural network processing. However, this forms part of a broader group of attacks that fall under the more general term adversarial machine learning or adversarial ML. Adversarial ML incorporates all potential attacks on machine learned algorithms (DNNs and other more traditional ML algorithms) and all types of data.11
Adversarial examples are sometimes (correctly) referred to as evasion attacks, where an evasion attack is the modification of input to avoid detection by an ML algorithm. However, adversarial input may be used for purposes other than evasion. Many of the attacks discussed in this book are evasion attacks, but some are not. For example, a system could be flooded with adversarial examples causing it to generate many false positives, potentially leading to a denial of service.
Possible other attacks that you might come across within the field of adversarial ML are:
- Poisoning attacks
-
A poisoning attack is when malicious data is deliberately introduced into the training dataset, resulting in the algorithm being mislearned. Systems that continually learn based on data acquired from untrusted sources are susceptible to this type of attack. This book is not about poisoning attacks.
- ML model reverse engineering
-
If an adversary were to acquire a copy of an ML algorithm, it might be possible to reverse engineer the algorithm to extract potentially confidential or sensitive information pertaining to the characteristics of the training data. This book does not address attacks of this type
Implications of Adversarial Input
DNN models are prevalent throughout our society, and we rely on them for many aspects of our daily lives. What’s more, many AI systems that contain these models work with data over which they have no control, often taking inputs from online digital sources and the physical world. For example:
-
Facial recognition systems for access or surveillance
-
Online web filters to detect upload of offensive or illegal imagery
-
Autonomous vehicles that function in unconstrained physical environments
-
Telephone voice-based fraud detection
-
Digital assistants that act upon voice commands
If DNNs can be so easily fooled by adversarial examples, does this present a cyberthreat to AI solutions that ingest data from untrusted sources? How much risk does adversarial input really pose to the security and integrity of systems in our day-to-day lives? Finally, what mitigation strategies might the developers of these systems use to ensure that this attack vector cannot be exploited?
To address these questions, we need to understand the motivations and capabilities of adversaries. We need to understand why DNNs fall for adversarial input and what can be done to make the DNNs less susceptible to this trickery. We also need to understand the broader processing chains of which DNNs are a part and how this processing might make systems more (or less) robust to attack. There is currently no known mechanism to make DNNs entirely resilient to adversarial input, but understanding both the perspective of the attacker and that of the defending organization will enable the development of better protective measures.
Adversarial input is also interesting from another perspective: the differences between the ways in which humans and DNNs process information highlights the discrepancies between biological and artificial neural networks. While it is true that DNNs were initially inspired by neuroscience and that the word neural appears within the DNN nomenclature, the discipline of developing effective deep learning has become primarily the domain of mathematicians and data scientists. A DNN is essentially a complex mathematical function that takes data as its input and generates a result. The training of a DNN model is simply an exercise in mathematical optimization: how to iteratively change aspects of the complex function to best improve its accuracy. The mere existence of adversarial inputs suggests that any notion that an approximation of human thinking is embodied in DNNs is fundamentally flawed.
Finally, we must not forget that the risk of introducing disruptive, fraudulent, or harmful behavior into a computer system exists if any of the algorithms in the system are flawed when presented with untested input. Ensuring that a computer system is robust against adversarial input should be an intrinsic part of the broader assurance of any computer system that contains machine learned algorithms.
1 Determining the kind of input data that this DNN would need to perform the required task is left as an exercise for the reader.
2 Christian Szegedy et al., “Intriguing Properties of Neural Networks,” ICLR (2014), http://bit.ly/2X2nu9c.
3 A training set will typically contain thousands of examples, but it is still a small proportion of the possible inputs.
4 As demonstrated in Nicholas Carlini and David Wagner, “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text,” IEEE Deep Learning and Security Workshop (2018), http://bit.ly/2IFXT1W.
5 A. Nguyen et al., “Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images,” Computer Vision and Pattern Recognition (2015), http://bit.ly/2ZKc1wW.
6 Tom B. Brown et al., “Adversarial Patch” (2017), http://bit.ly/2IDPonT.
7 Throughout this book, the term physical world is used to refer to the aspects of the world that exist outside the digital computer domain.
8 Mahmood Sharif et al., “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition,” Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016), http://bit.ly/2x1Nebf.
9 Kevin Eykholt et al., “Robust Physical-World Attacks on Deep Learning Visual Classification,” Computer Vision and Pattern Recognition (2018), http://bit.ly/2FmJPbz.
10 See for example Carlini and Wagner, “Audio Adversarial Examples.”
11 On first hearing the term adversarial machine learning, it might be misleading as it could also be interpreted to mean machine learning used by an adversary to attack a system.
Get Strengthening Deep Neural Networks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.