Hands-On Ensemble Learning with Python

Book description

Combine popular machine learning techniques to create ensemble models using Python

Key Features

  • Implement ensemble models using algorithms such as random forests and AdaBoost
  • Apply boosting, bagging, and stacking ensemble methods to improve the prediction accuracy of your model
  • Explore real-world data sets and practical examples coded in scikit-learn and Keras

Book Description

Ensembling is a technique of combining two or more similar or dissimilar machine learning algorithms to create a model that delivers superior predictive power. This book will demonstrate how you can use a variety of weak algorithms to make a strong predictive model.

With its hands-on approach, you'll not only get up to speed with the basic theory but also the application of different ensemble learning techniques. Using examples and real-world datasets, you'll be able to produce better machine learning models to solve supervised learning problems such as classification and regression. In addition to this, you'll go on to leverage ensemble learning techniques such as clustering to produce unsupervised machine learning models. As you progress, the chapters will cover different machine learning algorithms that are widely used in the practical world to make predictions and classifications. You'll even get to grips with the use of Python libraries such as scikit-learn and Keras for implementing different ensemble models.

By the end of this book, you will be well-versed in ensemble learning, and have the skills you need to understand which ensemble method is required for which problem, and successfully implement them in real-world scenarios.

What you will learn

  • Implement ensemble methods to generate models with high accuracy
  • Overcome challenges such as bias and variance
  • Explore machine learning algorithms to evaluate model performance
  • Understand how to construct, evaluate, and apply ensemble models
  • Analyze tweets in real time using Twitter's streaming API
  • Use Keras to build an ensemble of neural networks for the MovieLens dataset

Who this book is for

This book is for data analysts, data scientists, machine learning engineers and other professionals who are looking to generate advanced models using ensemble techniques. An understanding of Python code and basic knowledge of statistics is required to make the most out of this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Ensemble Learning with Python
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Code in action
      4. Conventions used
    4. Get in touch
      1. Reviews
  6. Section 1: Introduction and Required Software Tools
  7. A Machine Learning Refresher
    1. Technical requirements
    2. Learning from data
      1. Popular machine learning datasets
        1. Diabetes
        2. Breast cancer
        3. Handwritten digits
    3. Supervised and unsupervised learning
      1. Supervised learning
      2. Unsupervised learning
        1. Dimensionality reduction
    4. Performance measures
      1. Cost functions
        1. Mean absolute error
        2. Mean squared error
        3. Cross entropy loss
      2. Metrics
        1. Classification accuracy
        2. Confusion matrix
        3. Sensitivity, specificity, and area under the curve
        4. Precision, recall, and the F1 score
      3. Evaluating models
    5. Machine learning algorithms
      1. Python packages
      2. Supervised learning algorithms
        1. Regression
        2. Support vector machines
        3. Neural networks
        4. Decision trees
        5. K-Nearest Neighbors
        6. K-means
    6. Summary
  8. Getting Started with Ensemble Learning
    1. Technical requirements
    2. Bias, variance, and the trade-off
      1. What is bias?
      2. What is variance?
      3. Trade-off
    3. Ensemble learning
      1. Motivation
      2. Identifying bias and variance
        1. Validation curves
        2. Learning curves
      3. Ensemble methods
    4. Difficulties in ensemble learning
      1. Weak or noisy data
      2. Understanding interpretability
      3. Computational cost
      4. Choosing the right models
    5. Summary
  9. Section 2: Non-Generative Methods
  10. Voting
    1. Technical requirements
    2. Hard and soft voting
      1. Hard voting
      2. Soft voting
    3. ​Python implementation
      1. Custom hard voting implementation
        1. Analyzing our results using Python
    4. Using scikit-learn
      1. Hard voting implementation
      2. Soft voting implementation
        1. Analyzing our results
    5. Summary
  11. Stacking
    1. Technical requirements
    2. Meta-learning
      1. Stacking
      2. Creating metadata
    3. Deciding on an ensemble's composition
      1. Selecting base learners
      2. Selecting the meta-learner
    4. Python implementation
      1. Stacking for regression
      2. Stacking for classification
      3. Creating a stacking regressor class for scikit-learn
    5. Summary
  12. Section 3: Generative Methods
  13. Bagging
    1. Technical requirements
    2. Bootstrapping
      1. Creating bootstrap samples
    3. Bagging
      1. Creating base learners
      2. Strengths and weaknesses
    4. Python implementation
      1. Implementation
      2. Parallelizing the implementation
    5. Using scikit-learn
      1. Bagging for classification
      2. Bagging for regression
    6. Summary
  14. Boosting
    1. Technical requirements
    2. AdaBoost
      1. Weighted sampling
      2. Creating the ensemble
      3. Implementing AdaBoost in Python
      4. Strengths and weaknesses
    3. Gradient boosting
      1. Creating the ensemble
        1. Further reading
      2. Implementing gradient boosting in Python
    4. Using scikit-learn
      1. Using AdaBoost
      2. Using gradient boosting
    5. XGBoost
      1. Using XGBoost for regression
      2. Using XGBoost for classification
      3. Other boosting libraries
    6. Summary
  15. Random Forests
    1. Technical requirements
    2. Understanding random forest trees
      1. Building trees
      2. Illustrative example
      3. Extra trees
    3. Creating forests
      1. Analyzing forests
      2. Strengths and weaknesses
    4. Using scikit-learn
      1. Random forests for classification
      2. Random forests for regression
      3. Extra trees for classification
      4. Extra trees regression
    5. Summary
  16. Section 4: Clustering
  17. Clustering
    1. Technical requirements
    2. Consensus clustering
      1. Hierarchical clustering
      2. K-means clustering
        1. Strengths and weaknesses
      3. Using scikit-learn
      4. Using voting
    3. Using OpenEnsembles
      1. Using graph closure and co-occurrence linkage
        1. Graph closure
        2. Co-occurrence matrix linkage
    4. Summary
  18. Section 5: Real World Applications
  19. Classifying Fraudulent Transactions
    1. Technical requirements
    2. Getting familiar with the dataset
    3. Exploratory analysis
      1. Evaluation methods
    4. Voting
      1. Testing the base learners
      2. Optimizing the decision tree
      3. Creating the ensemble
    5. Stacking
    6. Bagging
    7. Boosting
      1. XGBoost
    8. Using random forests
    9. Comparative analysis of ensembles
    10. Summary
  20. Predicting Bitcoin Prices
    1. Technical requirements
    2. Time series data
      1. Bitcoin data analysis
      2. Establishing a baseline
      3. The simulator
    3. Voting
      1. Improving voting
    4. Stacking
      1. Improving stacking
    5. Bagging
      1. Improving bagging
    6. Boosting
      1. Improving boosting
    7. Random forests
      1. Improving random forest
    8. Summary
  21. Evaluating Sentiment on Twitter
    1. Technical requirements
    2. Sentiment analysis tools
      1. Stemming
    3. Getting Twitter data
    4. Creating a model
    5. Classifying tweets in real time
    6. Summary
  22. Recommending Movies with Keras
    1. Technical requirements
    2. Demystifying recommendation systems
    3. Neural recommendation systems
    4. Using Keras for movie recommendations
      1. Creating the dot model
      2. Creating the dense model
      3. Creating a stacking ensemble
    5. Summary
  23. Clustering World Happiness
    1. Technical requirements
    2. Understanding the World Happiness Report
    3. Creating the ensemble
    4. Gaining insights
    5. Summary
  24. Another Book You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Ensemble Learning with Python
  • Author(s): George Kyriakides, Konstantinos G. Margaritis
  • Release date: July 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789612851