book

Machine Learning for Email

by Drew Conway, John Myles White

October 2011

Intermediate to advanced

142 pages

4h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Machine Learning for Hackers: EmailHow This Book is OrganizedConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact Us
1. Using R
R for Machine LearningDownloading and Installing RWindowsMac OS XLinuxIDEs and Text EditorsLoading and Installing R PackagesR Basics for Machine LearningLoading libraries and the dataConverting date strings, and dealing with malformed dataOrganizing location dataDealing with data outside our scopeAggregating and organizing the dataAnalyzing the dataFurther Reading on R
2. Data Exploration
Exploration vs. ConfirmationWhat is Data?Inferring the Types of Columns in Your DataInferring MeaningNumeric SummariesMeans, Medians, and ModesQuantilesStandard Deviations and VariancesExploratory Data VisualizationModesSkewnessThin Tails vs. Heavy TailsVisualizing the Relationships between Columns
3. Classification: Spam Filtering
This or That: Binary ClassificationMoving Gently into Conditional ProbabilityWriting Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email TypesImproving the Results
4. Ranking: Priority Inbox
How Do You Sort Something When You Don’t Know the Order?Ordering Email Messages by PriorityPriority Features EmailWriting a Priority InboxFunctions for Extracting the Feature SetCreating a Weighting Scheme for RankingA Log-Weighting SchemeWeighting from Email Thread ActivityTraining and Testing the Ranker
Works Cited
About the Authors
Copyright

Content preview from Machine Learning for Email

Preface

Machine Learning for Hackers: Email

To explain the perspective from which this book was written, it will be helpful to define the terms machine learning and hackers.

What is machine learning? At the highest level of abstraction, we can think of machine learning as a set of tools and methods that attempt to infer patterns and extract insight from a record of the observable world. For example, if we’re trying to teach a computer to recognize the zip codes written on the fronts of envelopes, our data may consist of photographs of the envelopes along with a record of the zip code that each envelope was addressed to. That is, within some context we can take a record of the actions of our subjects, learn from this record, and then create a model of these activities that will inform our understanding of this context going forward. In practice, this requires data, and in contemporary applications this often means a lot of data (several terabytes). Most machine learning techniques take the availability of such a data set as given—which, in light of the quantities of data that are produced in the course of running modern companies, means new opportunities.

What is a hacker? Far from the stylized depictions of nefarious teenagers or Gibsonian cyber-punks portrayed in pop culture, we believe a hacker is someone who likes to solve problems and experiment with new technologies. If you’ve ever sat down with the latest O’Reilly book on a new computer language and knuckled out code until you ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449314835Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning for Email

by Drew Conway, John Myles White

Preface

Machine Learning for Hackers: Email

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.