O'Reilly logo
live online training icon Live Online training

Hands-on Applied Regression

Learn end-to-end regression with Python

enter image description here

Topic: Data
Matt Harrison

Regression models are used to predict everything from housing prices to sales numbers and more. Join expert Matt Harrison to learn how to leverage Jupyter to create predictive models that assign a number to a prediction. You’ll demo regression models, starting with linear regression, then learn how to create your own models, evaluate and tune them, and explain them. Along the way, you’ll look at decision trees and the popular XGBoost algorithm.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to use the scikit-learn module
  • How to evaluate regression results
  • How to explain regression models

And you’ll be able to:

  • Write Python code to create a regression model
  • Determine the efficacy of your model

This training course is for you because...

  • You’re a Python programmer who wants to learn how to create regression models.
  • You’re a data scientist or analyst who wants to learn about end-to-end regression.

Prerequisites

  • Experience programming with Python and manipulating data with pandas (This class uses pandas DataFrames as inputs to the example model but doesn’t go deep into pandas.)
  • No setup needed (The exercises will be provided in Jupyter notebooks, but if you want to download and run the code locally, you’ll need to install Python 3+, Jupyter, scikit-learn, Yellowbrick, XGBoost, and pandas.)

Recommended preparation:

Recommended follow-up:

About your instructor

  • Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Jupyter (10 minutes)

  • Presentation: Overview of Jupyter concepts and use

Linear regression (30 minutes)

  • Presentation: The linear regression model; pros and cons of linear regression
  • Jupyter Notebook exercise: Create a regression model using linear regression

Splitting data (25 minutes)

  • Presentation: Making sure that your model generalizes to the real world and doesn’t just memorize the data
  • Jupyter Notebook exercise: Split data into training and testing sets to ensure you aren’t leaking data
  • Break (5 minutes)

Model evaluation (25 minutes)

  • Presentation: Creating a decision tree model; reviewing metrics to determine how effective the two models are
  • Jupyter Notebook exercise: Explore regression metrics to understand the performance of your model

Tuning the model (25 minutes)

  • Presentation: Hyperparameters of decision trees; how to turn them
  • Jupyter Notebook exercise: Adjust parameters of the model to explore effects on the results of prediction
  • Break (5 minutes)

Explaining the model (25 minutes)

  • Presentation: Explaining a model to a customer or boss after you’ve created it
  • Jupyter Notebook exercise: Determine the features of the data that have the greatest impact on your model prediction

XGBoost (30 minutes)

  • Presentation: Overview of AGBoost, an algorithm that builds on decision trees to provide robust models
  • Jupyter Notebook exercise: Evaluate the XGBoost model on your regression problem