Chapter 3. Model Explainability and Interpretability

Making sense of a machine learning model can seem as hard as making sense of intelligence itself. Computer scientist Marvin Minsky famously described “intelligence” as a suitcase word: “a word that means nothing by itself, but holds a bunch of things inside that you have to unpack.” This becomes even more confusing when you see models with superhuman performance in some tasks (for example, playing Go or chess), but that fail epically in others (for example, mistaking a picture of a person on a bus for a pedestrian). Machine learning is great at creating functions that map to complex decision spaces. Problems arise when you want to understand why the model made a particular decision.

Even worse, “interpretability”—the tool you want to use to pick apart a model—may count as a suitcase word itself.

Explainability Versus Interpretability

Explainability and interpretability are often used interchangeably when it comes to making sense of ML models and their outputs. For interpretability, there are at least a few non-math-heavy definitions you could use. AI researcher Tim Miller described it as “the degree to which human beings can understand the cause of a decision,”1 while Kim et al. described it as “the degree to which a machine’s output can be consistently predicted.”2

What these definitions have in common is that they focus on the decisions that a model makes. Contrast this with explainability (sometimes referred to as Xplainable ...

Get Practicing Trustworthy Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.