O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Advanced Machine Learning with scikit-learn

Video Description

In this Advanced Machine Learning with scikit-learn training course, expert author Andreas Mueller will teach you how to choose and evaluate machine learning models. This course is designed for users that already have experience with Python.

You will start by learning about model complexity, overfitting and underfitting. From there, Andreas will teach you about pipelines, advanced metrics and imbalanced classes, and model selection for unsupervised learning. This video tutorial also covers dealing with categorical variables, dictionaries, and incomplete data, and how to handle text data. Finally, you will learn about out of core learning, including the sci-learn interface for out of core learning and kernel approximations for large-scale non-linear classification.

Once you have completed this computer based training course, you will have learned everything you need to know to be able to choose and evaluate machine learning models. Working files are included, allowing you to follow along with the author throughout the lessons.

Table of Contents

  1. Introduction
    1. What To Expect And About The Author 00:03:47
    2. Setup 00:02:14
    3. The Classifier Interface 00:08:29
    4. The Regressor Interface 00:02:53
    5. The Transformer Interface 00:02:11
    6. The Cluster Interface 00:06:04
    7. The Manifold Interface 00:03:32
    8. scikit-Learn Interface Summary 00:04:01
    9. Cross-Validation With Cross_Val_Score 00:06:19
    10. Parameter Searches With GridSearchCV 00:06:14
  2. Model Complexity, Overfitting And Underfitting
    1. What Is Model Complexity And Overfitting? 00:02:58
    2. Linear Models In-Depth 00:11:06
    3. Kernel SVMs In-Depth 00:07:40
    4. Random Forests In-Depth 00:06:02
    5. Learning Curves For Analyzing Model Complexity 00:03:54
    6. Validation Curves For Analyzing Model Parameters 00:02:30
    7. Efficient Parameter Search With EstimatorCV Objects 00:05:13
  3. Pipelines
    1. Motivation Of Using Pipelines 00:03:09
    2. Defining A Pipeline And Basic Usage 00:06:30
    3. Cross-Validation With Pipelines 00:02:32
    4. Parameter Selection With Pipelines 00:04:37
  4. Advanced Metrics And Imbalanced Classes
    1. Be Mindful Of Default Metrics 00:07:04
    2. More Evaluation Methods For Classification 00:05:17
    3. AUC 00:06:45
    4. Defining Custom Metrics 00:05:39
  5. Model Selection For Unsupervised Learning
    1. Guidelines For Unsupervised Model Selection 00:06:52
    2. Model Selection For Density Models 00:05:53
    3. Model Selection For Clustering 00:04:44
  6. Dealing With Categorical Variables, Dictionaries, And Incomplete Data
    1. Why Real Data Is Messy 00:06:24
    2. One-Hot Encoding For Categorical Data 00:06:24
    3. Working With Dictionaries 00:02:01
    4. Handling Incomplete Data 00:04:15
  7. Handling Text Data
    1. Motivation 00:02:51
    2. Bag-Of-Words Representations 00:06:48
    3. Text Classification For Sentiment Analysis - Part 1 00:07:25
    4. Text Classification For Sentiment Analysis - Part 2 00:04:01
    5. The Hashing Trick 00:03:25
    6. Other Representations - Distributed Word Representations 00:02:38
  8. Out Of Core Learning
    1. The Trade-Offs Of Out Of Core Learning 00:04:43
    2. The scikit-Learn Interface For Out Of Core Learning 00:05:13
    3. Kernel Approximations For Large-Scale Non-Linear Classification 00:05:06
    4. Subsample And Transform - Supervised Transformations For Out Of Core Learning 00:05:35
    5. Application - Out-Of-Core Text Classification 00:04:58
  9. Conclusion
    1. Summary 00:03:29
    2. Where To Go From Here 00:03:26