Linear Regression with Python: Essential Math for Data Science
Take control of your data by honing your fundamental math skills
Linear regression is a simple but often powerful tool to quantify the relationship between a value you want to predict with a set of explanatory variables. Once a relationship has been established, it is possible to apply further analysis like understanding the degree that each explanatory variable affects the predicted value. Mastery and understanding of the linear regression model is required before learning about more powerful machine learning models.
This is the second course in a fourpart series focused on essential math topics. These courses are grouped in pairs with this natural progression:
and
What you'll learnand how you can apply it
By the end of this live, handson, online course, you’ll understand:
 How linear regression works, and its limitations
 Model fitting metrics, like mean squared error, used to determine how well a model works
 Variation of standard linear regression like Ridge regression and how they prevent overfitting
And you’ll be able to:
 Apply linear regression on a data set to create a predictive model
 Quantify how well a linear regression model is performing
 Determine which variables have the greatest impact on the predicted value
This training course is for you because...
 You are someone in a technical role but are looking for foundational knowledge to transition into a data scientist position
 You work with data and want to start building predictive models
 You want to become a data analyst or data scientist
Prerequisites
 Basic math: addition, subtraction, multiplication and division
 Basic understanding of linear algebra: matrices and vectors
 Basic Python: variable creation, conditional statements, functions, loops
Recommended preparation:
 Take Linear Algebra with Python (live online training course)
Recommended followup:
 Take Probability with Python (live online training course)
 Take Statistics and Hypothesis Testing with Python (live online training course)
About your instructor

Russell Martin is a Data Scientist in Residence at The Data Incubator. He received his PhD in Applied Mathematics from the Georgia Institute of Technology. Russ lived and worked in the UK for seventeen years, including at Warwick University and the University of Liverpool, where he taught in the Department of Computer Science. As a Data Scientist in Residence, Russ instructs Fellows in our Data Science Fellowship, teaches online courses, and leads trainings with our corporate partners.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Getting Started (5 minutes)
 Presentation: Introduction to Jupyter Notebook environment
 Pulse check: Everyone ready to get started?
Introduction to Linear Regression (5 minutes)
 Presentation: What is linear regression? What aspects of it are covered in this class?
Regression Metrics (20 minutes)
 Presentation: Mean squared error
 Presentation: Mean absolute error
 Exercise: Determining MSE for California Housing Data
 Presentation: Rsquared, coefficient of determination
 Presentation: Optimization
 Q&A and Discussion (10 minutes)
 Break (5 minutes)
Creating a Linear Regressor in ScikitLearn (15 minutes)
 Lecture: The ScikitLearn workflow
 Exercise: Creating a regressor (Jupyter Notebook)
Adding Features to Improve Performance (15 minutes)
 Lecture: How creating features can improve performance
 Exercise: How to augment the California Housing data (Jupyter Notebook)
Motivation of Regularization (10 minutes)
 Lecture: What is overfitting?
 Lecture: What is regularization and how does it prevent overfitting?
Regularization in Action (15 minutes)
 Lecture: How does Ridge Regression improve Linear Regression?
 Exercise: Setting up a Ridge Regressor in ScikitLearn.
 Q&A and Discussion (10 minutes)
Next Steps: Stochastic Gradient Descent (10 minutes)
 Presentation: Working with large datasets
 Exercise: Example of an SGD pipeline
 Q&A