book

Think Stats, 2nd Edition

Name: Think Stats, 2nd Edition
Author: Allen B. Downey
ISBN: 9781491907368

by Allen B. Downey

October 2014

Beginner

226 pages

5h 42m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
How I Wrote This BookUsing the CodeContributor ListSafari® Books OnlineHow to Contact Us
1. Exploratory Data Analysis
A Statistical ApproachThe National Survey of Family GrowthImporting the DataDataFramesVariablesTransformationValidationInterpretationExercisesGlossary
2. Distributions
Representing HistogramsPlotting HistogramsNSFG VariablesOutliersFirst BabiesSummarizing DistributionsVarianceEffect SizeReporting ResultsExercisesGlossary
3. Probability Mass Functions
PmfsPlotting PMFsOther VisualizationsThe Class Size ParadoxDataFrame IndexingExercisesGlossary
4. Cumulative Distribution Functions
The Limits of PMFsPercentilesCDFsRepresenting CDFsComparing CDFsPercentile-Based StatisticsRandom NumbersComparing Percentile RanksExercisesGlossary
5. Modeling Distributions
The Exponential DistributionThe Normal DistributionNormal Probability PlotThe lognormal DistributionThe Pareto DistributionGenerating Random NumbersWhy Model?ExercisesGlossary
6. Probability Density Functions
PDFsKernel Density EstimationThe Distribution FrameworkHist ImplementationPmf ImplementationCdf ImplementationMomentsSkewnessExercisesGlossary
7. Relationships Between Variables
Scatter PlotsCharacterizing RelationshipsCorrelationCovariancePearson’s CorrelationNonlinear RelationshipsSpearman’s Rank CorrelationCorrelation and CausationExercisesGlossary
8. Estimation
The Estimation GameGuess the VarianceSampling DistributionsSampling BiasExponential DistributionsExercisesGlossary
9. Hypothesis Testing
Classical Hypothesis TestingHypothesisTestTesting a Difference in MeansOther Test StatisticsTesting a CorrelationTesting ProportionsChi-Squared TestsFirst Babies AgainErrorsPowerReplicationExercisesGlossary

10. Linear Least Squares
Least Squares FitImplementationResidualsEstimationGoodness of FitTesting a Linear ModelWeighted ResamplingExercisesGlossary
11. Regression
StatsModelsMultiple RegressionNonlinear RelationshipsData MiningPredictionLogistic RegressionEstimating ParametersImplementationAccuracyExercisesGlossary
12. Time Series Analysis
Importing and CleaningPlottingLinear RegressionMoving AveragesMissing ValuesSerial CorrelationAutocorrelationPredictionFurther ReadingExercisesGlossary
13. Survival Analysis
Survival CurvesHazard FunctionEstimating Survival CurvesKaplan-Meier EstimationThe Marriage CurveEstimating the Survival FunctionConfidence IntervalsCohort EffectsExtrapolationExpected Remaining LifetimeExercisesGlossary
14. Analytic Methods
Normal DistributionsSampling DistributionsRepresenting Normal DistributionsCentral Limit TheoremTesting the CLTApplying the CLTCorrelation TestChi-Squared TestDiscussionExercises
Index
Colophon
Copyright

Content preview from Think Stats, 2nd Edition

Preface

This book is an introduction to the practical tools of exploratory data analysis. The organization of the book follows the process I use when I start working with a dataset:

Importing and cleaning: Whatever format the data is in, it usually takes some time and effort to read the data, clean and transform it, and check that everything made it through the translation process intact.
Single variable explorations: I usually start by examining one variable at a time, finding out what the variables mean, looking at distributions of the values, and choosing appropriate summary statistics.
Pair-wise explorations: To identify possible relationships between variables, I look at tables and scatter plots, and compute correlations and linear fits.
Multivariate analysis: If there are apparent relationships between variables, I use multiple regression to add control variables and investigate more complex relationships.
Estimation and hypothesis testing: When reporting statistical results, it is important to answer three questions: How big is the effect? How much variability should we expect if we run the same measurement again? Is it possible that the apparent effect is due to chance?
Visualization: During exploration, visualization is an important tool for finding possible relationships and effects. Then if an apparent effect holds up to scrutiny, visualization is an effective way to communicate results.

This book takes a computational approach, which has several advantages over mathematical ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491907344Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Think Stats, 2nd Edition

by Allen B. Downey

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.