Statistics and hypothesis testing with Python: Essential math for data science
Take control of your data by honing your fundamental math skills
Topic: Data
Machine learning requires strong statistical foundations. To be effective at machine learning, you need to know the difference between standard error and standard deviation, what the pvalue is and why it’s important when rejecting the null hypothesis, and how to calculate the standard error for hypothesis testing.
Get handson experience with rigorous statistical analysis: parameter estimation, hypothesis testing, pvalues, zscores, and other core concepts in statistical inference. You’ll practice on realworld datasets using Python's statsoriented libraries to ask interesting, relevant questions and draw concrete inferences from population data.
This is the fourth course in a fourpart series focused on essential math topics. We recommend taking Linear Algebra with Python, Linear Regression with Python, and Probability with Python first.
What you'll learnand how you can apply it
By the end of this live online course, you’ll understand:
 How to use the central limit theorem in statistics
 What hypothesis testing and parameter estimation are
 How bootstrapping for parameter estimation works
And you’ll be able to:
 Perform hypothesis testing to determine if a result is statistically significant
 Calculate confidence intervals to quantify a measurement uncertainty
 Apply bootstrapping to determine confidence intervals for any general estimator
 Implement A/B testing
This training course is for you because...
 You’re in a technical role, but you’re looking to transition into a data scientist or data analyst position.
 You want to apply datadriven decision making in your position.
 You work with data and want to generate insights and analysis.
Prerequisites
 A basic understanding of Python (variable creation, conditional statements, functions, and loops) and statistical values (mean, median, and mode)
Recommended preparation:
 Take Linear Algebra with Python (live online training course)
 Take Linear Regression with Python (live online training course)
 Take Probability with Python (live online training course)
Recommended followup:
 Read Think Stats, 2nd Edition (book)
 Read Data Science from Scratch, 2nd Edition (book)
 Read Doing Math with Python (book)
 Watch Just Enough Math (video course)
About your instructor

Michael holds a master’s degree in statistics and a bachelor’s degree in mathematics. His academic research areas ranged from computational paleobiology, where he developed software for measuring evidence for disparate evolutionary models based on fossil data, to music and AI, where he assisted in modeling musical data for a jazz improvisation robot.
In his current work, Michael teaches handson courses in data science as well as businessoriented topics in managing data science initiatives at the organizational level. Aside from teaching, he leads internal data science projects for Pragmatic Institute in support of the marketing and operations teams. In his free time, he applies his math and programming skills toward creating codebased visual art and design projects.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Introduction to statistics and statistical inference (10 minutes)
 Lecture: The Jupyter Notebook environment; statistical inference
 Group discussion: Do union or nonunion construction workers have higher salaries?
Hypothesis testing of the mean (10 minutes)
 Lecture: Mean estimation
 Handson exercise: Generate income samples for union workers
Standard error of the mean (10 minutes)
 Lecture: Standard error of the mean
 Handson exercise: Estimate the standard error
 Group discussion: Increasing sample size
Confidence intervals (10 minutes)
 Lecture: Confidence intervals
 Handson exercise: Calculate the confidence interval for the unionized salary mean
Hypothesis testing two means (5 minutes)
 Lecture: Null hypothesis
 Group discussion: Rejecting the null hypothesis with unknown population means
Estimating variance (5 minutes)
 Lecture: Unbiased estimator
Students’ tdistribution (15 minutes)
 Lecture: Large n assumption
 Q&A
 Break (5 minutes)
Standard error of proportion and variance (15 minutes)
 Lecture: Standard error of proportion; the rule of three; the standard error of variance estimate
 Handson exercise: Change variables to interact with SEP figures
Hypothesis testing for counts (10 minutes)
 Lecture: The chisquared hypothesis test
 Group discussion: Where else would you use a chisquared test?
Bootstrapping (10 minutes)
 Lecture: Subsampling data
Determining distributions (5 minutes)
 Lecture: Data matching distribution
 Handson activity: Apply the KolmogorovSmirnov test
Wrapup and Q&A (10 minutes)