Skip to Content
SQL for Data Analysis
book

SQL for Data Analysis

by Cathy Tanimura
September 2021
Beginner to intermediate
357 pages
9h 53m
English
O'Reilly Media, Inc.
Book available
Content preview from SQL for Data Analysis

Chapter 6. Anomaly Detection

An anomaly is something that is different from other members of the same group. In data, an anomaly is a record, an observation, or a value that differs from the remaining data points in a way that raises concerns or suspicions. Anomalies go by a number of different names, including outliers, novelties, noise, deviations, and exceptions, to name a few. I’ll use the terms anomaly and outlier interchangeably throughout this chapter, and you may see the other terms used in discussions of this topic as well. Anomaly detection can be the end goal of an analysis or a step within a broader analysis project.

Anomalies typically have one of two sources: real events that are extreme or otherwise unusual, or errors introduced during data collection or processing. While many of the steps used to detect outliers are the same regardless of the source, how we choose to handle a particular anomaly depends on the root cause. As a result, understanding the root cause and distinguishing between the two types of causes is important to the analysis process.

Real events can generate outliers for a variety of reasons. Anomalous data can signal fraud, network intrusion, structural defects in a product, loopholes in policies, or product use that wasn’t intended or envisioned by the developers. Anomaly detection is widely used to root out financial fraud, and cybersecurity also makes use of this type of analysis. Sometimes anomalous data results not because a bad actor is trying ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

SQL for Data Analytics

SQL for Data Analytics

Upom Malik, Matt Goldwasser, Benjamin Johnston
Analytics Engineering with SQL and dbt

Analytics Engineering with SQL and dbt

Rui Pedro Machado, Helder Russa

Publisher Resources

ISBN: 9781492088776Errata PageSupplemental Content