Chapter 3Descriptive Analytics for Fraud Detection

Introduction

Descriptive analytics or unsupervised learning aims at finding unusual anomalous behavior deviating from the average behavior or norm (Bolton and Hand 2002). This norm can be defined in various ways. It can be defined as the behavior of the average customer at a snapshot in time, or as the average behavior of a given customer across a particular time period, or as a combination of both. Predictive analytics or supervised learning, as will be discussed in the following chapter, assumes the availability of a historical data set with known fraudulent transactions. The analytical models built can thus only detect fraud patterns as they occurred in the past. Consequently, it will be impossible to detect previously unknown fraud. Predictive analytics can however also be useful to help explain the anomalies found by descriptive analytics, as we will discuss later.

When used for fraud detection, unsupervised learning is often referred to as anomaly detection, since it aims at finding anomalous and thus suspicious observations. In the literature, anomalies are commonly described as outliers or exceptions. One of the first definitions of an outlier was provided by Grubbs (1969), as follows:

“An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.”

A first challenge when using unsupervised learning is to define the average behavior or norm. Typically, ...

Get Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.