Skip to Content
View all events

Tree Models in Python with Scikit-learn

Published by O'Reilly Media, Inc.

Build and tune decision trees, random forests, and XGBoost

What you’ll learn and how you can apply it

  • Build and interpret tree-based models to make predictions from tabular data
  • Compare and contrast the architecture and results of tree ensembles
  • Build and tune XGBoost models to improve performance and predictions

Course description

Author and educator Corey Wade guides you hands-on through the experience of building and optimizing tree-based models using scikit-learn with a focus on XGBoost. Trees occupy an essential middle ground in machine learning because they surpass linear models in complexity while offering better interpretability and efficiency than deep learning. Knowing when trees are preferable and how to maximize their results are essential skills for data scientists and machine learning developers.

You’ll code along with Corey and do exercises independently to gain fluency in building models to make predictions from real data. You’ll analyze and interpret model results with automated search techniques to fine-tune hyperparameters. Along the way you’ll discover ML best practices such as splitting data and using cross-validation to prevent overfitting. By the end of the course, you’ll be able to build, tune, and interpret high-performance tree-based models while being mindful of their strengths and weaknesses.

This live event is for you because...

  • You want to gain fluency building machine learning models in scikit-learn.
  • You’ve heard about XGBoost and would like to build an XGBoost model.
  • You want to become a data scientist or machine learning developer.

Prerequisites

  • Prior experience building a machine learning model in Python using linear or logistic regression
  • Fluency in Python at the level of accessing libraries and methods
  • A private Gmail account to access Google Colaboratory Notebooks

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Building decision trees (45 minutes)

  • Presentation: What are decision trees?; interpreting decision trees
  • Code-along: Classifying data with decision trees; splitting data; modifying max_depth
  • Hands-on exercise: Solve a regression problem with decision trees
  • Q&A

Building random forests (45 minutes)

  • Presentation: What are random forests?
  • Code-along: Classifying data with random forests; cross-validation; modifying n_estimators with max_depth
  • Hands-on exercise: Solve regression problem with random forests
  • Q&A
  • Break

Building models with extreme gradient boosting (45 minutes)

  • Presentation: What is extreme gradient boosting?
  • Code-along: Classify data with XGBoost; modify col, row parameters; a brief survey of LightGBM, CatBoost alternatives
  • Hands-on exercise: Solve regression problem with gradient boosted trees
  • Q&A
  • Break

Fine-tuning trees (45 minutes)

  • Presentation: Ranges and meaning of XGBoost hyperparameters
  • Code-along: Modify XGBoost hyperparameters with GridSearchCV; modify XGBoost hyperparameters with RandomizedSearchCV
  • Hands-on exercise: Fine-tune best tree models for regression problem
  • Q&A

Your Instructor

  • Corey Wade

    Corey Wade, MS Mathematics, MFA Writing & Consciousness, started Berkeley Coding Academy in 2020 to bring Python, Data Science, Machine Learning, and AI to a larger audience. Corey also teaches Math, Programming, and Data Science at the Independent Study Program of Berkeley High School where has worked since 2004. A Springboard Data Science graduate, Corey has worked in industry developing Data Science curricula for Pathstream and Hello World. A multiple grant award winner, lead author of The Python Workshop(2019, 2022), and author of Hands-on Gradient Boosting with XGBoost and Scikit-learn(2020), Corey’s passion in education comes from teaching advanced technical concepts to students of all ages. When not coding or teaching, Corey reads poetry and studies the stars.

Skills covered

  • Scikit-learn
  • Data Visualization
  • Ensemble Learning
  • Data Science