O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Machine Learning for Email

Book Description

If you’re an experienced programmer willing to crunch data, this concise guide will show you how to use machine learning to work with email. You’ll learn how to write algorithms that automatically sort and redirect email based on statistical patterns. Authors Drew Conway and John Myles White approach the process in a practical fashion, using a case-study driven approach rather than a traditional math-heavy presentation.

This book also includes a short tutorial on using the popular R language to manipulate and analyze data. You’ll get clear examples for analyzing sample data and writing machine learning programs with R.

  • Mine email content with R functions, using a collection of sample files
  • Analyze the data and use the results to write a Bayesian spam classifier
  • Rank email by importance, using factors such as thread activity
  • Use your email ranking analysis to write a priority inbox program
  • Test your classifier and priority inbox with a separate email sample set

Table of Contents

  1. Preface
    1. Machine Learning for Hackers: Email
    2. How This Book is Organized
    3. Conventions Used in This Book
    4. Using Code Examples
    5. Safari® Books Online
    6. How to Contact Us
  2. 1. Using R
    1. R for Machine Learning
      1. Downloading and Installing R
        1. Windows
        2. Mac OS X
        3. Linux
      2. IDEs and Text Editors
      3. Loading and Installing R Packages
      4. R Basics for Machine Learning
        1. Loading libraries and the data
        2. Converting date strings, and dealing with malformed data
        3. Organizing location data
        4. Dealing with data outside our scope
        5. Aggregating and organizing the data
        6. Analyzing the data
    2. Further Reading on R
  3. 2. Data Exploration
    1. Exploration vs. Confirmation
    2. What is Data?
    3. Inferring the Types of Columns in Your Data
    4. Inferring Meaning
    5. Numeric Summaries
    6. Means, Medians, and Modes
    7. Quantiles
    8. Standard Deviations and Variances
    9. Exploratory Data Visualization
      1. Modes
      2. Skewness
      3. Thin Tails vs. Heavy Tails
    10. Visualizing the Relationships between Columns
  4. 3. Classification: Spam Filtering
    1. This or That: Binary Classification
    2. Moving Gently into Conditional Probability
    3. Writing Our First Bayesian Spam Classifier
      1. Defining the Classifier and Testing It with Hard Ham
      2. Testing the Classifier Against All Email Types
      3. Improving the Results
  5. 4. Ranking: Priority Inbox
    1. How Do You Sort Something When You Don’t Know the Order?
    2. Ordering Email Messages by Priority
      1. Priority Features Email
    3. Writing a Priority Inbox
      1. Functions for Extracting the Feature Set
      2. Creating a Weighting Scheme for Ranking
        1. A Log-Weighting Scheme
      3. Weighting from Email Thread Activity
      4. Training and Testing the Ranker
  6. Works Cited
  7. About the Authors
  8. Copyright