book

Practical Machine Learning: A New Look at Anomaly Detection

by Ted Dunning, Ellen Friedman

August 2014

Intermediate to advanced

66 pages

1h 25m

English

O'Reilly Media, Inc.

Read now

Unlock full access

1. Looking Toward the Future
2. The Shape of Anomaly Detection
Finding “Normal”If you enjoy math, read this description of a probabilistic model of “normal”…Human Insight HelpsFinding AnomaliesOnce again, if you like math, this description of anomalies is for you…Take-Home Lesson: Key Steps in Anomaly DetectionA Simple Approach: Threshold Models
3. Using t-Digest for Threshold Automation
The Philosophy Behind Setting the ThresholdUsing t-Digest for Accurate Calculation of Extreme QuantilesIssues with Simple Thresholds
4. More Complex, Adaptive Models
Windows and ClustersMatches with the Windowed Reconstruction: Normal FunctionMismatches with the Windowed Reconstruction: Anomalous FunctionA Powerful But Simple TechniqueLooking Toward Modeling More Problematic Inputs
5. Anomalies in Sporadic Events
Counts Don’t Work WellArrival Times Are the KeyAnd Now with the Math…Event Rate in a Worked Example: Website Traffic PredictionExtreme Seasonality Effects
6. No Phishing Allowed!
The Phishing AttackThe No-Phishing-Allowed Anomaly DetectorHow the Model WorksPutting It All Together
7. Anomaly Detection for the Future
A. Additional Resources
GitHubApache Mahout Open Source ProjectAdditional Publications
About the Authors
Colophon

Content preview from Practical Machine Learning: A New Look at Anomaly Detection

Chapter 6. No Phishing Allowed!

One of the most important uses for anomaly detection is to identify potentially fraudulent behavior and thus reduce risk of loss and improve security. The nefarious behaviors to be found could be credit card fraud, identity theft, or phishing attacks on a secure website such as an online banking site. It’s not only challenging to think of how to create an effective model and alert system—it’s also a challenge to stay one step (or even two) ahead of the fraudsters. As you find ways to foil their attacks, they keep looking for new ways to commit theft. In this situation, agility, cost-effective and practical approaches, and innovation are all required.

Let’s take a look at a method that lets a machine-learning model quickly identify a hypothetical phishing attack on a bank site and flag it as suspicious. This example will extend the concepts of a probabilistic model that we have developed in previous chapters to situations that involve sequences of events.

The Phishing Attack

The attack is based on luring bank customers to a fake website in order to capture their private login details. The plan also includes having the customer unknowingly type in the CAPTCHA security code for the fraudsters that their fraud-bot script would not be able to do by itself without human help. A description of how the fraud might be attempted is given here and summarized in Figure 6-1.

Step 1: A huge number of customers receive an automated email that appears to be from the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Real-World Applications of Regression Models with Count Outcomes

Publisher Resources

ISBN: 9781491914151Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Practical Machine Learning: A New Look at Anomaly Detection

by Ted Dunning, Ellen Friedman

Chapter 6. No Phishing Allowed!

The Phishing Attack

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.