Chapter 11. Methods in Interpretability

Overview

The field of interpretability is broad and can be uniquely applied to a variety of tasks. Simply put, interpretability defines a model’s ability to “explain” its decision making to a third party. There are many modern architectures that do not have this capability just by construction. A neural network, for example, is a prime example of one of these modern architectures. The term “opaque” is often used to describe neural networks, both in media and in literature. This is because, without post hoc techniques to explain the final classification or regression result of a neural network, the data transformations occurring within the trained model are unclear and difficult for the end user to interpret. All we know is that we fed in an example and out popped a result. Although we can examine the learned weights of a neural network, the composition of all of these weights is an extremely complex function. This makes it difficult to tell what part of the input ended up contributing the most to the final result.

A variety of post hoc methodologies have been designed to explain the output of a neural network—saliency mapping is a prime example. Saliency mapping measures the gradient of the output of a trained model with respect to the input. By the definition of the gradient, the input positions with the highest magnitude gradients would affect the output value, or class in the case of classification, the most when their values ...

Get Fundamentals of Deep Learning, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.