O'Reilly logo
live online training icon Live Online training

Advanced Machine Learning with scikit-learn

Robust models, textual analysis, and deeper feature analysis

David Mertz, Ph.D.

The algorithmic extraction of knowledge from data—known as machine learning—is the main goal of data analysis and prediction, for business and science. The ability to perform complex analysis of data, moving beyond the basic tools of statistics, has been refined and developed increasingly over the last two decades. Over a similar period, Python has grown to be the premier language for data science, and scikit-learn has grown to be the main toolkit used within Python for general purpose machine learning.

What you'll learn-and how you can apply it

  • Feature extraction techniques for textual data
  • Feature selection classes (applicable to many types of problems)
  • Robust train/test splits and cross-validation
  • Value imputes
  • Specialized and custom metrics for model evaluation

This training course is for you because...

Aspiring or beginning data scientists. Students in this course should have a comfortable intermediate level knowledge of Python and a very basic familiarity with statistics and linear algebra. College students or working programmers who are motivated to expand their skills to include machine learning with Python are a perfect fit.

Prerequisites

  • A first course in Python and/or working experience as a programmer
  • College level basic mathematics

Course Set-up

  • Students should have a system with Jupyter notebooks installed, a recent version of scikit-learn, along with Pandas, NumPy, and matplotlib, and the general scientific Python tool stack. The training materials will be made available as notebooks at a GitHub repository.

Recommended Preparation

This advanced class is intended for those who have attended or are already familiar with the topics covered in the following classes (search O’Reilly Learning for dates): - Beginning Machine Learning with scikit-learn (Live Online Training) by David Mertz - Intermediate Machine Learning with scikit-learn (Live Online Training) by David Mertz

Recommended Follow-up

About your instructor

  • David Mertz is a data scientist, trainer, and erstwhile startup CTO, who is currently writing the Addison Wesley title Cleaning Data for Successful Data Science: Doing the other 80% of the work. He created the training program for Anaconda, Inc. He was a Director of the Python Software Foundation for six years and remains chair of a few PSF committees. For nine years, David helped with creating the world's fastest—highly-specialized—supercomputer for performing molecular dynamics.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Feature extraction techniques for textual data (45 min)

Feature selection classes (45 min) - Break (5 min)

Robust train/test splits and cross-validation (45 min)

Value imputation (45 min) - Break (5 min)

Specialized and custom metrics for model evaluation (45 min)

Class wrap-up (5 min)