Skip to Content
Doing Data Science
book

Doing Data Science

by Cathy O'Neil, Rachel Schutt
October 2013
Beginner
405 pages
10h 9m
English
O'Reilly Media, Inc.
Content preview from Doing Data Science

Chapter 2. Statistical Inference, Exploratory Data Analysis, and the Data Science Process

We begin this chapter with a discussion of statistical inference and statistical thinking. Next we explore what we feel every data scientist should do once they’ve gotten data in hand for any data-related project: exploratory data analysis (EDA).

From there, we move into looking at what we’re defining as the data science process in a little more detail. We’ll end with a thought experiment and a case study.

Statistical Thinking in the Age of Big Data

Big Data is a vague term, used loosely, if often, these days. But put simply, the catchall phrase means three things. First, it is a bundle of technologies. Second, it is a potential revolution in measurement. And third, it is a point of view, or philosophy, about how decisions will be—and perhaps should be—made in the future.

Steve Lohr, The New York Times

When you’re developing your skill set as a data scientist, certain foundational pieces need to be in place first—statistics, linear algebra, some programming. Even once you have those pieces, part of the challenge is that you will be developing several skill sets in parallel simultaneously—data preparation and munging, modeling, coding, visualization, and communication—that are interdependent. As we progress through the book, these threads will be intertwined. That said, we need to start somewhere, and will begin by getting grounded in statistical inference.

We expect the readers of this ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Statistics for Data Scientists

Practical Statistics for Data Scientists

Peter Bruce, Andrew Bruce

Publisher Resources

ISBN: 9781449363871Errata PageSupplemental Content