Skip to Content
Machine Learning for Email
book

Machine Learning for Email

by Drew Conway, John Myles White
October 2011
Intermediate to advanced
142 pages
4h 15m
English
O'Reilly Media, Inc.
Content preview from Machine Learning for Email

Chapter 2. Data Exploration

Exploration vs. Confirmation

Whenever you work with data, it’s helpful to imagine breaking up your analysis into two completely separate parts: exploration and confirmation. The distinction between exploratory data analysis and confirmatory data analysis comes down to us from the famous John Tukey,[6] who emphasized the importance of designing simple tools for practical data analysis. In Tukey’s mind, the exploratory steps in data analysis involve using summary tables and basic visualizations to search for hidden patterns in your data. In this chapter, we’ll describe some of the basic tools that R provides for summarizing your data numerically and then we’ll teach you how to make sense of the results. After that, we’ll show you some of the tools that exist in R for visualizing your data; at the same time, we’ll give you a whirlwind tour of the basic visual patterns that you should keep an eye out for in any visualization.

But, before you start searching through your first data set, we should warn you about a real danger that’s present whenever you explore data: you’re likely to find patterns that aren’t really there. The human mind is designed to find patterns in the world and will do so even when those patterns are just quirks of chance. You don’t need a degree in statistics to know that we human beings will easily find shapes in clouds after looking at them for only a few seconds. And plenty of people have convinced themselves that they’ve discovered ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Machine Learning Pocket Reference

Machine Learning Pocket Reference

Matt Harrison
Relational Power Is the New Currency of Hybrid Work

Relational Power Is the New Currency of Hybrid Work

Lebene Soga, Yemisi Bolade-Ogunfodun, Nazrul Islam, Joseph Amankwah-Amoah

Publisher Resources

ISBN: 9781449314835Errata Page