Machine Learning Fundamentals

Book description

With the flexibility and features of scikit-learn and Python, build machine learning algorithms that optimize the programming process and take application performance to a whole new level

Key Features

  • Explore scikit-learn uniform API and its application into any type of model
  • Understand the difference between supervised and unsupervised models
  • Learn the usage of machine learning through real-world examples

Book Description

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem.

The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters.

By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.

What you will learn

  • Understand the importance of data representation
  • Gain insights into the differences between supervised and unsupervised models
  • Explore data using the Matplotlib library
  • Study popular algorithms, such as k-means, Mean-Shift, and DBSCAN
  • Measure model performance through different metrics
  • Implement a confusion matrix using scikit-learn
  • Study popular algorithms, such as Naive-Bayes, Decision Tree, and SVM
  • Perform error analysis to improve the performance of the model
  • Learn to build a comprehensive machine learning program

Who this book is for

Machine Learning Fundamentals is designed for developers who are new to the field of machine learning and want to learn how to use the scikit-learn library to develop machine learning algorithms. You must have some knowledge and experience in Python programming, but you do not need any prior knowledge of scikit-learn or machine learning algorithms.

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. About the Book
      1. About the Author
      2. Objectives
      3. Audience
      4. Approach
      5. Minimum Hardware Requirements
      6. Software Requirements
      7. Installation and Setup
      8. Installing the Code Bundle
      9. Additional Resources
      10. Conventions
  2. Introduction to Scikit-Learn
    1. Introduction
    2. Scikit-Learn
      1. Advantages of Scikit-Learn
      2. Disadvantages of Scikit-Learn
    3. Data Representation
      1. Tables of Data
      2. Features and Target Matrices
      3. Exercise 1: Loading a Sample Dataset and Creating the Features and Target Matrices
      4. Activity 1: Selecting a Target Feature and Creating a Target Matrix
    4. Data Preprocessing
      1. Messy Data
      2. Exercise 2: Dealing with Messy Data
      3. Dealing with Categorical Features
      4. Exercise 3: Applying Feature Engineering over Text Data
      5. Rescaling Data
      6. Exercise 4: Normalizing and Standardizing Data
      7. Activity 2: Preprocessing an Entire Dataset
    5. Scikit-Learn API
      1. How Does It Work?
    6. Supervised and Unsupervised Learning
      1. Supervised Learning
      2. Unsupervised Learning
    7. Summary
  3. Unsupervised Learning: Real-Life Applications
    1. Introduction
    2. Clustering
      1. Clustering Types
      2. Applications of Clustering
    3. Exploring a Dataset: Wholesale Customers Dataset
      1. Understanding the Dataset
    4. Data Visualization
      1. Loading the Dataset Using Pandas
      2. Visualization Tools
      3. Exercise 5: Plotting a Histogram of One Feature from the Noisy Circles Dataset
      4. Activity 3: Using Data Visualization to Aid the Preprocessing Process
    5. k-means Algorithm
      1. Understanding the Algorithm
      2. Exercise 6: Importing and Training the k-means Algorithm over a Dataset
      3. Activity 4: Applying the k-means Algorithm to a Dataset
    6. Mean-Shift Algorithm
      1. Understanding the Algorithm
      2. Exercise 7: Importing and Training the Mean-Shift Algorithm over a Dataset
      3. Activity 5: Applying the Mean-Shift Algorithm to a Dataset
    7. DBSCAN Algorithm
      1. Understanding the Algorithm
      2. Exercise 8: Importing and Training the DBSCAN Algorithm over a Dataset
      3. Activity 6: Applying the DBSCAN Algorithm to the Dataset
    8. Evaluating the Performance of Clusters
      1. Available Metrics in Scikit-Learn
      2. Exercise 9: Evaluating the Silhouette Coefficient Score and Calinski–Harabasz Index
      3. Activity 7: Measuring and Comparing the Performance of the Algorithms
    9. Summary
  4. Supervised Learning: Key Steps
    1. Introduction
    2. Model Validation and Testing
      1. Data Partition
      2. Split Ratio
      3. Exercise 10: Performing Data Partition over a Sample Dataset
      4. Cross Validation
      5. Exercise 11: Using Cross-Validation to Partition the Train Set into a Training and a Validation Set
      6. Activity 8: Data Partition over a Handwritten Digit Dataset
    3. Evaluation Metrics
      1. Evaluation Metrics for Classification Tasks
      2. Exercise 12: Calculating Different Evaluation Metrics over a Classification Task
      3. Choosing an Evaluation Metric
      4. Evaluation Metrics for Regression Tasks
      5. Exercise 13: Calculating Evaluation Metrics over a Regression Task
      6. Activity 9: Evaluating the Performance of the Model Trained over a Handwritten Dataset
    4. Error Analysis
      1. Bias, Variance, and Data Mismatch
      2. Exercise 14: Calculating the Error Rate over Different Sets of Data
      3. Activity 10: Performing Error Analysis over a Model Trained to Recognize Handwritten Digits
    5. Summary
  5. Supervised Learning Algorithms: Predict Annual Income
    1. Introduction
    2. Exploring the Dataset
      1. Understanding the Dataset
    3. Naïve Bayes Algorithm
      1. How Does It Work?
      2. Exercise 15: Applying the Naïve Bayes Algorithm
      3. Activity 11: Training a Naïve Bayes Model for Our Census Income Dataset
    4. Decision Tree Algorithm
      1. How Does It Work?
      2. Exercise 16: Applying the Decision Tree Algorithm
      3. Activity 12: Training a Decision Tree Model for Our Census Income Dataset
    5. Support Vector Machine Algorithm
      1. How Does It Work?
      2. Exercise 17: Applying the SVM Algorithm
      3. Activity 13: Training an SVM Model for Our Census Income Dataset
    6. Error Analysis
      1. Accuracy, Precision, and Recall
    7. Summary
  6. Artificial Neural Networks: Predict Annual Income
    1. Introduction
    2. Artificial Neural Networks
      1. How Do They Work?
      2. Understanding the Hyperparameters
      3. Applications
      4. Limitations
    3. Applying an Artificial Neural Network
      1. Scikit-Learn's Multilayer Perceptron
      2. Exercise 18: Applying the Multilayer Perceptron Classifier Class
      3. Activity 14: Training a Multilayer Perceptron for Our Census Income Dataset
    4. Performance Analysis
      1. Error Analysis
      2. Hyperparameter Fine-Tuning
      3. Model Comparison
      4. Activity 15: Comparing Different Models to Choose the Best Fit for the Census Income Data Problem
    5. Summary
  7. Building Your Own Program
    1. Introduction
    2. Program Definition
      1. Building a Program: Key Stages
      2. Understanding the Dataset
      3. Activity 16: Performing the Preparation and Creation Stages for the Bank Marketing Dataset
    3. Saving and Loading a Trained Model
      1. Saving a Model
      2. Exercise 19: Saving a Trained Model
      3. Loading a Model
      4. Exercise 20: Loading a Saved Model
      5. Activity 17: Saving and Loading the Final Model for the Bank Marketing Dataset
    4. Interacting with a Trained Model
      1. Exercise 21: Creating a Class and a Channel to Interact with a Trained Model
      2. Activity 18: Allowing Interaction with the Bank Marketing Dataset Model
    5. Summary
  8. Appendix
    1. Chapter 1: Introduction to scikit-learn
      1. Activity 1: Selecting a Target Feature and Creating a Target Matrix
      2. Activity 2: Preprocessing an Entire Dataset
    2. Chapter 2: Unsupervised Learning: Real-life Applications
      1. Activity 3: Using Data Visualization to Aid the Preprocessing Process
      2. Activity 4: Applying the k-means Algorithm to a Dataset
      3. Activity 5: Applying the Mean-Shift Algorithm to a Dataset
      4. Activity 6: Applying the DBSCAN Algorithm to the Dataset
      5. Activity 7: Measuring and Comparing the Performance of the Algorithms
    3. Chapter 3: Supervised Learning: Key Steps
      1. Activity 8: Data Partition over a Handwritten Digit Dataset
      2. Activity 9: Evaluating the Performance of the Model Trained over a Handwritten Dataset
      3. Activity 10: Performing Error Analysis over a Model Trained to Recognize Handwritten Digits
    4. Chapter 4: Supervised Learning Algorithms: Predict Annual Income
      1. Activity 11: Training a Naïve Bayes Model for our Census Income Dataset
      2. Activity 12: Training a Decision Tree Model for our Census Income Dataset
      3. Activity 13: Training a SVM Model for our Census Income Dataset
    5. Chapter 5: Artificial Neural Networks: Predict Annual Income
      1. Activity 14: Training a Multilayer Perceptron for our Census Income Dataset
      2. Activity 15: Comparing Different Models to Choose the Best Fit for the Census Income Data Problem
    6. Chapter 6: Building Your Own Program
      1. Activity 16: Performing the Preparation and Creation Stages for the Bank Marketing Dataset
      2. Activity 17: Saving and Loading the Final Model for the Bank Marketing Dataset
      3. Activity 18: Allowing Interaction with the Bank Marketing Dataset Model

Product information

  • Title: Machine Learning Fundamentals
  • Author(s): Hyatt Saleh
  • Release date: November 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789803556