O'Reilly logo
live online training icon Live Online training

Hands-on Applied Classification

Learn end-to-end classification with Python

enter image description here

Topic: Data
Matt Harrison

Classification models are used in industry to predict fraud, churn, cancer, and more. Creating a model is easy but the work doesn’t stop there.

Join expert Matt Harrison to learn how to leverage Jupyter to demo predictive models that assign labels to data. Using decision trees, you’ll create your own models, then learn how to evaluate them to understand how well the models will perform in the wild; tune the characteristics of the model to be more or less lenient in assigning labels; and explain your models to other stakeholders. Along the way, you’ll explore the popular XGBoost algorithm.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to use the scikit-learn, a popular open source library that defines the interface for many Python libraries
  • How to evaluate classification results to determine how well a model performs
  • How to explain classification models to decision makers and end users so they understand which data points drive decisions

And you’ll be able to:

  • Write Python code to create a classification model
  • Determine the efficacy of your model

This training course is for you because...

  • You’re a Python programmer who wants to learn how to create classification models.
  • You’re a data scientist or analyst who wants to learn about end-to-end classification.

Prerequisites

  • Experience programming with Python and manipulating data with pandas (This class uses pandas DataFrames as inputs to the example model but doesn’t go deep into pandas.)
  • No setup needed (The exercises will be provided in Jupyter notebooks, but if you want to download and run the code locally, you’ll need to install Python 3+, Jupyter, scikit-learn, Yellowbrick, XGBoost, and pandas.)

Recommended preparation:

Recommended follow-up:

About your instructor

  • Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Jupyter (10 minutes)

  • Presentation: Overview of Jupyter concepts and use

Decision trees (30 minutes)

  • Presentation: Overview of decision tree models; the pros and cons of decision trees
  • Jupyter Notebook exercise: Create a decision tree model for classification

Splitting data (25 minutes)

  • Presentation: Making sure that your model generalizes to the real world and doesn’t just memorize the data
  • Jupyter Notebook exercise: Split data to ensure validation results aren’t leaking data
  • Break (5 minutes)

Model evaluation (25 minutes)

  • Presentation: Metrics to determine how effective your model is
  • Jupyter Notebook exercise: Explore metrics on the model you’ve created

Tuning the model (25 minutes)

  • Presentation: Hyperparameters of decision trees; how to tune them
  • Jupyter Notebook exercise: Tweak hyperparameters to understand how model performance changes
  • Break (5 minutes)

Explaining the model (25 minutes)

  • Presentation: Explaining a model to a customer or boss after you’ve created it
  • Jupyter Notebook exercise: List important features of the data that determine model output

XGBoost (30 minutes)

  • Presentation: Overview of XGBoost, an algorithm that builds on decision trees to provide robust models
  • Jupyter Notebook exercise: Explore XGBoost performance against a plain decision tree