Introduction to Statistics for Data Analysis with Python
Learn the fundamentals of statistics answering realworld questions
Topic: Data
This training session focuses on learning ways to implement fundamental concepts of statistics which are essential for every data scientist. We'll witness how statistics enable us to derive insights from raw information to answer our realworld problems/questions. For every aspiring data scientist, statistics opens up the doors to all the major domains which make use of data science.
What you'll learnand how you can apply it
By the end of this live, online course, you’ll understand:
 Data exploration and visualization
 Fundamentals of Descriptive strategy  mean, median mode, measurement of spread, standard deviation, percentile, variance, skewness, correlation, etc
 Inferential statistics  basic principles behind using data for estimation and for assessing theories
And you’ll be able to:
 Explore the data using statistics.
 build statistical models.
This training course is for you because...
 You are a programmer or an aspiring data analyst/scientist.
 For all the beginners in the field of Data/ML/AI with some familiarity with elementary mathematics, and python programming.
Prerequisites
 Python Programming, Pandas, Matplotlib
 Basic Mathematics
 No prior experience with statistics necessary
About your instructor

Harshit Tyagi is a full stack developer and data engineer at Elucidata, a biotech company based in Cambridge, where he develops algorithms for research scientists at some of the world’s best medical schools, including Yale, UCLA, and MIT. Previously, he was a systems development engineer at the investment management firm Tradelogic, where he designed a framework to analyze financial news from prominent sources to produce accurate trading signals. He’s a Python evangelist and loves to contribute to tech communities, including Google Developers Groups and Python Delhi User Groups, as well as other online learning platforms.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Introduction to Data Visualisation (50 mins)
 Presentation (15min): Learning how to extract and explore data and understand what different plots and charts mean and represent.
 Discussion (5 mins): Libraries we can use in python for plotting?
 Presentation (15 mins): Overview of different Python plotting libraries, including Numpy, Pandas, Statsmodels, Matplotlib, and Seaborn.
 Exercise (15mins): Practice plotting and Exploratory Data Analysis
 Q&A (5 mins)
Introduction to Descriptive Strategy (50 mins)
 Presentation (20 mins): Basics of Descriptive strategy Mean, Median, Mode, variance, standard deviation, central tendency, etc
 Discussion (10 mins): How can we answer realworld questions using statistics  ex: Who is the best player of football in the world?
 Presentation (15 mins): How does Netflix know what we like?  Percentile, variance, skewness, correlation.
 Exercise (15 mins): Problem: Should we buy an extended warranty on electrical appliances?
 Q&A (5 mins)
Basics of inferential statistics (60mins)
 Presentation (20 mins): Basic principles behind inferential statistics  analyzing categorical and qualitative data, constructing confidence intervals and sampling.
 Codelab walkthrough (15 mins): Use numpy, pandas, statsmodel and seaborn to analyse case studies.
 Exercise (15 mins): Use the concepts to work on an industry problem
 Q&A (10 mins)
Takehome exercise:
 Exercise: Create a statistical model to recommend the type of insurance to individuals based on their location, occupation, marital status, and many other features.