book

Machine Learning and Security

by Clarence Chio, David Freeman

February 2018

Intermediate to advanced

383 pages

11h 30m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
What’s In This Book?Who Is This Book For?Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
1. Why Machine Learning and Security?
Cyber Threat LandscapeThe Cyber Attacker’s EconomyA Marketplace for Hacking SkillsIndirect MonetizationThe UpshotWhat Is Machine Learning?What Machine Learning Is NotAdversaries Using Machine LearningReal-World Uses of Machine Learning in SecuritySpam Fighting: An Iterative ApproachLimitations of Machine Learning in Security
2. Classifying and Clustering
Machine Learning: Problems and ApproachesMachine Learning in Practice: A Worked ExampleTraining Algorithms to LearnModel FamiliesLoss FunctionsOptimizationSupervised Classification AlgorithmsLogistic RegressionDecision TreesDecision ForestsSupport Vector MachinesNaive Bayesk-Nearest NeighborsNeural NetworksPractical Considerations in ClassificationSelecting a Model FamilyTraining Data ConstructionFeature SelectionOverfitting and UnderfittingChoosing Thresholds and Comparing ModelsClusteringClustering AlgorithmsEvaluating Clustering ResultsConclusion
3. Anomaly Detection
When to Use Anomaly Detection Versus Supervised LearningIntrusion Detection with HeuristicsData-Driven MethodsFeature Engineering for Anomaly DetectionHost Intrusion DetectionNetwork Intrusion DetectionWeb Application Intrusion DetectionIn SummaryAnomaly Detection with Data and AlgorithmsForecasting (Supervised Machine Learning)Statistical MetricsGoodness-of-FitUnsupervised Machine Learning AlgorithmsDensity-Based MethodsIn SummaryChallenges of Using Machine Learning in Anomaly DetectionResponse and MitigationPractical System Design ConcernsOptimizing for ExplainabilityMaintainability of Anomaly Detection SystemsIntegrating Human FeedbackMitigating Adversarial EffectsConclusion
4. Malware Analysis
Understanding MalwareDefining Malware ClassificationMalware: Behind the ScenesFeature GenerationData CollectionGenerating FeaturesFeature SelectionFrom Features to ClassificationHow to Get Malware Samples and LabelsConclusion
5. Network Traffic Analysis
Theory of Network DefenseAccess Control and AuthenticationIntrusion DetectionDetecting In-Network AttackersData-Centric SecurityHoneypotsSummaryMachine Learning and Network SecurityFrom Captures to FeaturesThreats in the NetworkBotnets and YouBuilding a Predictive Model to Classify Network AttacksExploring the DataData PreparationClassificationSupervised LearningSemi-Supervised LearningUnsupervised LearningAdvanced EnsemblingConclusion
6. Protecting the Consumer Web
Monetizing the Consumer WebTypes of Abuse and the Data That Can Stop ThemAuthentication and Account TakeoverAccount CreationFinancial FraudBot ActivitySupervised Learning for Abuse ProblemsLabeling DataCold Start Versus Warm StartFalse Positives and False NegativesMultiple ResponsesLarge AttacksClustering AbuseExample: Clustering Spam DomainsGenerating ClustersScoring ClustersFurther Directions in ClusteringConclusion
7. Production Systems
Defining Machine Learning System Maturity and ScalabilityWhat’s Important for Security Machine Learning Systems?Data QualityProblem: Bias in DatasetsProblem: Label InaccuracySolutions: Data QualityProblem: Missing DataSolutions: Missing DataModel QualityProblem: Hyperparameter OptimizationSolutions: Hyperparameter OptimizationFeature: Feedback Loops, A/B Testing of ModelsFeature: Repeatable and Explainable ResultsPerformanceGoal: Low Latency, High ScalabilityPerformance OptimizationHorizontal Scaling with Distributed Computing FrameworksUsing Cloud ServicesMaintainabilityProblem: Checkpointing, Versioning, and Deploying ModelsGoal: Graceful DegradationGoal: Easily Tunable and ConfigurableMonitoring and AlertingSecurity and ReliabilityFeature: Robustness in Adversarial ContextsFeature: Data Privacy Safeguards and GuaranteesFeedback and UsabilityConclusion
8. Adversarial Machine Learning
TerminologyThe Importance of Adversarial MLSecurity Vulnerabilities in Machine Learning AlgorithmsAttack TransferabilityAttack Technique: Model PoisoningExample: Binary Classifier Poisoning AttackAttacker KnowledgeDefense Against Poisoning AttacksAttack Technique: Evasion AttackExample: Binary Classifier Evasion AttackDefense Against Evasion AttacksConclusion
A. Supplemental Material for Chapter 2
More About MetricsSize of Logistic Regression ModelsImplementing the Logistic Regression Cost FunctionMinimizing the Cost Function

B. Integrating Open Source Intelligence
Security Intelligence FeedsGeolocation
Index

Content preview from Machine Learning and Security

Chapter 1. Why Machine Learning and Security?

In the beginning, there was spam.

As soon as academics and scientists had hooked enough computers together via the internet to create a communications network that provided value, other people realized that this medium of free transmission and broad distribution was a perfect way to advertise sketchy products, steal account credentials, and spread computer viruses.

In the intervening 40 years, the field of computer and network security has come to encompass an enormous range of threats and domains: intrusion detection, web application security, malware analysis, social network security, advanced persistent threats, and applied cryptography, just to name a few. But even today spam remains a major focus for those in the email or messaging space, and for the general public spam is probably the aspect of computer security that most directly touches their own lives.

Machine learning was not invented by spam fighters, but it was quickly adopted by statistically inclined technologists who saw its potential in dealing with a constantly evolving source of abuse. Email providers and internet service providers (ISPs) have access to a wealth of email content, metadata, and user behavior. Using email data, content-based models can be built to create a generalizable approach to recognize spam. Metadata and entity reputations can be extracted from emails to predict the likelihood that an email is spam without even looking at its content. By instantiating ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Hands-On Machine Learning for Cybersecurity

Publisher Resources

ISBN: 9781491979891Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning and Security

by Clarence Chio, David Freeman

Chapter 1. Why Machine Learning and Security?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.