4Evasion Attacks—Tricking AI Models at Inference
Evasion attacks represent one of the most immediate and operationally dangerous threats in the adversarial machine learning landscape. Unlike poisoning or backdoor attacks that target the training phase, evasion occurs at inference—precisely where AI systems are deployed, trusted, and acting in real time. These attacks exploit the fragile boundaries of learned models, manipulating inputs just enough to induce misclassification without triggering human suspicion or standard validation checks. As AI becomes deeply integrated into security-sensitive environments, from identity verification to autonomous systems, the ability to reliably detect and defend against evasion becomes central to AI risk governance.
Understanding the diverse mechanisms of evasion—from gradient-based perturbation in image classifiers to subtle linguistic manipulation in text models and temporal distortion in time-series systems—is essential to securing AI deployments across modalities. These attacks demonstrate how small changes in surface-level data can produce disproportionately harmful outcomes, revealing latent weaknesses in model generalization, embedding sensitivity, and feature attention. The chapter explores how attackers tailor perturbations to bypass defenses, disrupt detection, and degrade model performance—all while remaining within constraints that preserve realism and operational believability.
From automated perturbation pipelines to physical-world ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access