O'Reilly logo
live online training icon Live Online training

Introduction to Statistics for Data Analysis with Python

enter image description here

Learn the fundamentals of statistics answering real-world questions

Topic: Data
Harshit Tyagi

This training session focuses on learning ways to implement fundamental concepts of statistics which are essential for every data scientist. We'll witness how statistics enable us to derive insights from raw information to answer our real-world problems/questions. For every aspiring data scientist, statistics opens up the doors to all the major domains which make use of data science.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • Data exploration and visualization
  • Fundamentals of Descriptive strategy - mean, median mode, measurement of spread, standard deviation, percentile, variance, skewness, correlation, etc
  • Inferential statistics - basic principles behind using data for estimation and for assessing theories

And you’ll be able to:

  • Explore the data using statistics.
  • build statistical models.

This training course is for you because...

  • You are a programmer or an aspiring data analyst/scientist.
  • For all the beginners in the field of Data/ML/AI with some familiarity with elementary mathematics, and python programming.


  • Python Programming, Pandas, Matplotlib
  • Basic Mathematics
  • No prior experience with statistics necessary

About your instructor

  • Harshit Tyagi is a full stack developer and data engineer at Elucidata, a biotech company based in Cambridge, where he develops algorithms for research scientists at some of the world’s best medical schools, including Yale, UCLA, and MIT. Previously, he was a systems development engineer at the investment management firm Tradelogic, where he designed a framework to analyze financial news from prominent sources to produce accurate trading signals. He’s a Python evangelist and loves to contribute to tech communities, including Google Developers Groups and Python Delhi User Groups, as well as other online learning platforms.


The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Data Visualisation (50 mins)

  • Presentation (15min): Learning how to extract and explore data and understand what different plots and charts mean and represent.
  • Discussion (5 mins): Libraries we can use in python for plotting?
  • Presentation (15 mins): Overview of different Python plotting libraries, including Numpy, Pandas, Statsmodels, Matplotlib, and Seaborn.
  • Exercise (15mins): Practice plotting and Exploratory Data Analysis
  • Q&A (5 mins)

Introduction to Descriptive Strategy (50 mins)

  • Presentation (20 mins): Basics of Descriptive strategy Mean, Median, Mode, variance, standard deviation, central tendency, etc
  • Discussion (10 mins): How can we answer real-world questions using statistics - ex: Who is the best player of football in the world?
  • Presentation (15 mins): How does Netflix know what we like? - Percentile, variance, skewness, correlation.
  • Exercise (15 mins): Problem: Should we buy an extended warranty on electrical appliances?
  • Q&A (5 mins)

Basics of inferential statistics (60mins)

  • Presentation (20 mins): Basic principles behind inferential statistics - analyzing categorical and qualitative data, constructing confidence intervals and sampling.
  • Codelab walkthrough (15 mins): Use numpy, pandas, statsmodel and seaborn to analyse case studies.
  • Exercise (15 mins): Use the concepts to work on an industry problem
  • Q&A (10 mins)

Take-home exercise:

  1. Exercise: Create a statistical model to recommend the type of insurance to individuals based on their location, occupation, marital status, and many other features.