Live Online training

# Linear Regression with Python: Essential Math for Data Science

## What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

• How linear regression works, and its limitations
• Model fitting metrics, like mean squared error, used to determine how well a model works
• Variation of standard linear regression like Ridge regression and how they prevent overfitting

And you’ll be able to:

• Apply linear regression on a data set to create a predictive model
• Quantify how well a linear regression model is performing
• Determine which variables have the greatest impact on the predicted value

## This training course is for you because...

• You are someone in a technical role but are looking for foundational knowledge to transition into a data scientist position
• You work with data and want to start building predictive models
• You want to become a data analyst or data scientist

Prerequisites

• Basic math: addition, subtraction, multiplication and division
• Basic understanding of linear algebra: matrices and vectors
• Basic Python: variable creation, conditional statements, functions, loops

Recommended preparation:

Recommended follow-up:

• Michael holds a master’s degree in statistics and a bachelor’s degree in mathematics. His academic research areas ranged from computational paleobiology, where he developed software for measuring evidence for disparate evolutionary models based on fossil data, to music and AI, where he assisted in modeling musical data for a jazz improvisation robot.

In his current work, Michael teaches hands-on courses in data science as well as business-oriented topics in managing data science initiatives at the organizational level. Aside from teaching, he leads internal data science projects for Pragmatic Institute in support of the marketing and operations teams. In his free time, he applies his math and programming skills toward creating code-based visual art and design projects.

## Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Getting Started (5 minutes)

• Presentation: Introduction to Jupyter Notebook environment
• Pulse check: Everyone ready to get started?

Introduction to Linear Regression (5 minutes)

• Presentation: What is linear regression? What aspects of it are covered in this class?

Regression Metrics (20 minutes)

• Presentation: Mean squared error
• Presentation: Mean absolute error
• Exercise: Determining MSE for California Housing Data
• Presentation: R-squared, coefficient of determination
• Presentation: Optimization
• Q&A and Discussion (10 minutes)
• Break (5 minutes)

Creating a Linear Regressor in Scikit-Learn (15 minutes)

• Lecture: The Scikit-Learn workflow
• Exercise: Creating a regressor (Jupyter Notebook)

Adding Features to Improve Performance (15 minutes)

• Lecture: How creating features can improve performance
• Exercise: How to augment the California Housing data (Jupyter Notebook)

Motivation of Regularization (10 minutes)

• Lecture: What is overfitting?
• Lecture: What is regularization and how does it prevent overfitting?

Regularization in Action (15 minutes)

• Lecture: How does Ridge Regression improve Linear Regression?
• Exercise: Setting up a Ridge Regressor in Scikit-Learn.
• Q&A and Discussion (10 minutes)

Next Steps: Stochastic Gradient Descent (10 minutes)

• Presentation: Working with large datasets
• Exercise: Example of an SGD pipeline
• Q&A