Hands-on Applied Classification
Learn end-to-end classification with Python
Classification models are used in industry to predict fraud, churn, cancer, and more. Creating a model is easy but the work doesn’t stop there.
Join expert Matt Harrison to learn how to leverage Jupyter to demo predictive models that assign labels to data. Using decision trees, you’ll create your own models, then learn how to evaluate them to understand how well the models will perform in the wild; tune the characteristics of the model to be more or less lenient in assigning labels; and explain your models to other stakeholders. Along the way, you’ll explore the popular XGBoost algorithm.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand:
- How to use the scikit-learn, a popular open source library that defines the interface for many Python libraries
- How to evaluate classification results to determine how well a model performs
- How to explain classification models to decision makers and end users so they understand which data points drive decisions
And you’ll be able to:
- Write Python code to create a classification model
- Determine the efficacy of your model
This training course is for you because...
- You’re a Python programmer who wants to learn how to create classification models.
- You’re a data scientist or analyst who wants to learn about end-to-end classification.
- Experience programming with Python and manipulating data with pandas (This class uses pandas DataFrames as inputs to the example model but doesn’t go deep into pandas.)
- No setup needed (The exercises will be provided in Jupyter notebooks, but if you want to download and run the code locally, you’ll need to install Python 3+, Jupyter, scikit-learn, Yellowbrick, XGBoost, and pandas.)
- Take Getting started with Python 3 (live online training course with Matt Harrison)
- Take Getting started with Pandas (live online training course with Matt Harrison)
About your instructor
Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.
The timeframes are only estimates and may vary according to how the class is progressing
Introduction to Jupyter (10 minutes)
- Presentation: Overview of Jupyter concepts and use
Decision trees (30 minutes)
- Presentation: Overview of decision tree models; the pros and cons of decision trees
- Jupyter Notebook exercise: Create a decision tree model for classification
Splitting data (25 minutes)
- Presentation: Making sure that your model generalizes to the real world and doesn’t just memorize the data
- Jupyter Notebook exercise: Split data to ensure validation results aren’t leaking data
- Break (5 minutes)
Model evaluation (25 minutes)
- Presentation: Metrics to determine how effective your model is
- Jupyter Notebook exercise: Explore metrics on the model you’ve created
Tuning the model (25 minutes)
- Presentation: Hyperparameters of decision trees; how to tune them
- Jupyter Notebook exercise: Tweak hyperparameters to understand how model performance changes
- Break (5 minutes)
Explaining the model (25 minutes)
- Presentation: Explaining a model to a customer or boss after you’ve created it
- Jupyter Notebook exercise: List important features of the data that determine model output
XGBoost (30 minutes)
- Presentation: Overview of XGBoost, an algorithm that builds on decision trees to provide robust models
- Jupyter Notebook exercise: Explore XGBoost performance against a plain decision tree