book

Think Stats, 2nd Edition

Name: Think Stats, 2nd Edition
Author: Allen B. Downey
ISBN: 9781491907368

by Allen B. Downey

October 2014

Beginner

226 pages

5h 42m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
How I Wrote This BookUsing the CodeContributor ListSafari® Books OnlineHow to Contact Us
1. Exploratory Data Analysis
A Statistical ApproachThe National Survey of Family GrowthImporting the DataDataFramesVariablesTransformationValidationInterpretationExercisesGlossary
2. Distributions
Representing HistogramsPlotting HistogramsNSFG VariablesOutliersFirst BabiesSummarizing DistributionsVarianceEffect SizeReporting ResultsExercisesGlossary
3. Probability Mass Functions
PmfsPlotting PMFsOther VisualizationsThe Class Size ParadoxDataFrame IndexingExercisesGlossary
4. Cumulative Distribution Functions
The Limits of PMFsPercentilesCDFsRepresenting CDFsComparing CDFsPercentile-Based StatisticsRandom NumbersComparing Percentile RanksExercisesGlossary
5. Modeling Distributions
The Exponential DistributionThe Normal DistributionNormal Probability PlotThe lognormal DistributionThe Pareto DistributionGenerating Random NumbersWhy Model?ExercisesGlossary
6. Probability Density Functions
PDFsKernel Density EstimationThe Distribution FrameworkHist ImplementationPmf ImplementationCdf ImplementationMomentsSkewnessExercisesGlossary
7. Relationships Between Variables
Scatter PlotsCharacterizing RelationshipsCorrelationCovariancePearson’s CorrelationNonlinear RelationshipsSpearman’s Rank CorrelationCorrelation and CausationExercisesGlossary
8. Estimation
The Estimation GameGuess the VarianceSampling DistributionsSampling BiasExponential DistributionsExercisesGlossary
9. Hypothesis Testing
Classical Hypothesis TestingHypothesisTestTesting a Difference in MeansOther Test StatisticsTesting a CorrelationTesting ProportionsChi-Squared TestsFirst Babies AgainErrorsPowerReplicationExercisesGlossary

10. Linear Least Squares
Least Squares FitImplementationResidualsEstimationGoodness of FitTesting a Linear ModelWeighted ResamplingExercisesGlossary
11. Regression
StatsModelsMultiple RegressionNonlinear RelationshipsData MiningPredictionLogistic RegressionEstimating ParametersImplementationAccuracyExercisesGlossary
12. Time Series Analysis
Importing and CleaningPlottingLinear RegressionMoving AveragesMissing ValuesSerial CorrelationAutocorrelationPredictionFurther ReadingExercisesGlossary
13. Survival Analysis
Survival CurvesHazard FunctionEstimating Survival CurvesKaplan-Meier EstimationThe Marriage CurveEstimating the Survival FunctionConfidence IntervalsCohort EffectsExtrapolationExpected Remaining LifetimeExercisesGlossary
14. Analytic Methods
Normal DistributionsSampling DistributionsRepresenting Normal DistributionsCentral Limit TheoremTesting the CLTApplying the CLTCorrelation TestChi-Squared TestDiscussionExercises
Index
Colophon
Copyright

Content preview from Think Stats, 2nd Edition

Chapter 1. Exploratory Data Analysis

The thesis of this book is that data combined with practical methods can answer questions and guide decisions under uncertainty.

As an example, I present a case study motivated by a question I heard when my wife and I were expecting our first child: do first babies tend to arrive late?

If you Google this question, you will find plenty of discussion. Some people claim it’s true, others say it’s a myth, and some people say it’s the other way around: first babies come early.

In many of these discussions, people provide data to support their claims. I found many examples like these:

“My two friends that have given birth recently to their first babies, BOTH went almost 2 weeks overdue before going into labour or being induced.”
“My first one came 2 weeks late and now I think the second one is going to come out two weeks early!!”
“I don’t think that can be true because my sister was my mother’s first and she was early, as with many of my cousins.”

Reports like these are called anecdotal evidence because they are based on data that is unpublished and usually personal. In casual conversation, there is nothing wrong with anecdotes, so I don’t mean to pick on the people I quoted.

But we might want evidence that is more persuasive and an answer that is more reliable. By those standards, anecdotal evidence usually fails, because:

Small number of observations: If pregnancy length is longer for first babies, the difference is probably small compared to natural variation. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491907344Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Think Stats, 2nd Edition

by Allen B. Downey

Chapter 1. Exploratory Data Analysis

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.