Live Online training

# Introduction to Statistics for Data Analysis with Python

## What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

• Data exploration and visualization
• Fundamentals of Descriptive strategy - mean, median mode, measurement of spread, standard deviation, percentile, variance, skewness, correlation, etc
• Inferential statistics - basic principles behind using data for estimation and for assessing theories

And you’ll be able to:

• Explore the data using statistics.
• build statistical models.

## This training course is for you because...

• You are a programmer or an aspiring data analyst/scientist.
• For all the beginners in the field of Data/ML/AI with some familiarity with elementary mathematics, and python programming.

Prerequisites

• Python Programming, Pandas, Matplotlib
• Basic Mathematics
• No prior experience with statistics necessary

• Harshit Tyagi is a Full Stack Developer and Data Engineer at Elucidata, a Cambridge based Biotech company. He develops algorithms for research scientists at the world’s best medical schools like Yale, UCLA, and MIT. Before Elucidata, he was working as a Systems Development Engineer at an Investment Management firm called Tradelogic where he designed a framework to analyze financial news from all prominent sources to produce accurate trading signals. He is a Python evangelist and loves to contribute to tech communities like Google Developers Groups, Python Delhi User Groups, and other E-learning platforms. With the skills acquired over years and being a mentor and reviewer for more than 3 years in the E-learning era, it’d be great to share the enterprise-grade practices to produce more skillful data scientists and quantitative traders.

## Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Data Visualisation (50 mins)

• Presentation (15min): Learning how to extract and explore data and understand what different plots and charts mean and represent.
• Discussion (5 mins): Libraries we can use in python for plotting?
• Presentation (15 mins): Overview of different Python plotting libraries, including Numpy, Pandas, Statsmodels, Matplotlib, and Seaborn.
• Exercise (15mins): Practice plotting and Exploratory Data Analysis
• Q&A (5 mins)

Introduction to Descriptive Strategy (50 mins)

• Presentation (20 mins): Basics of Descriptive strategy Mean, Median, Mode, variance, standard deviation, central tendency, etc
• Discussion (10 mins): How can we answer real-world questions using statistics - ex: Who is the best player of football in the world?
• Presentation (15 mins): How does Netflix know what we like? - Percentile, variance, skewness, correlation.
• Exercise (15 mins): Problem: Should we buy an extended warranty on electrical appliances?
• Q&A (5 mins)

Basics of inferential statistics (60mins)

• Presentation (20 mins): Basic principles behind inferential statistics - analyzing categorical and qualitative data, constructing confidence intervals and sampling.
• Codelab walkthrough (15 mins): Use numpy, pandas, statsmodel and seaborn to analyse case studies.
• Exercise (15 mins): Use the concepts to work on an industry problem
• Q&A (10 mins)

Take-home exercise:

1. Exercise: Create a statistical model to recommend the type of insurance to individuals based on their location, occupation, marital status, and many other features.