Python Machine Learning By Example - Third Edition

Book description

A comprehensive guide to get you up to speed with the latest developments of practical machine learning with Python and upgrade your understanding of machine learning (ML) algorithms and techniques

Key Features

  • Dive into machine learning algorithms to solve the complex challenges faced by data scientists today
  • Explore cutting edge content reflecting deep learning and reinforcement learning developments
  • Use updated Python libraries such as TensorFlow, PyTorch, and scikit-learn to track machine learning projects end-to-end

Book Description

Python Machine Learning By Example, Third Edition serves as a comprehensive gateway into the world of machine learning (ML).

With six new chapters, on topics including movie recommendation engine development with Naive Bayes, recognizing faces with support vector machine, predicting stock prices with artificial neural networks, categorizing images of clothing with convolutional neural networks, predicting with sequences using recurring neural networks, and leveraging reinforcement learning for making decisions, the book has been considerably updated for the latest enterprise requirements.

At the same time, this book provides actionable insights on the key fundamentals of ML with Python programming. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries.

Each chapter walks through an industry-adopted application. With the help of realistic examples, you will gain an understanding of the mechanics of ML techniques in areas such as exploratory data analysis, feature engineering, classification, regression, clustering, and NLP.

By the end of this ML Python book, you will have gained a broad picture of the ML ecosystem and will be well-versed in the best practices of applying ML techniques to solve problems.

What you will learn

  • Understand the important concepts in ML and data science
  • Use Python to explore the world of data mining and analytics
  • Scale up model training using varied data complexities with Apache Spark
  • Delve deep into text analysis and NLP using Python libraries such NLTK and Gensim
  • Select and build an ML model and evaluate and optimize its performance
  • Implement ML algorithms from scratch in Python, TensorFlow 2, PyTorch, and scikit-learn

Who this book is for

If you're a machine learning enthusiast, data analyst, or data engineer highly passionate about machine learning and want to begin working on machine learning assignments, this book is for you.

Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Getting Started with Machine Learning and Python
    1. An introduction to machine learning
      1. Understanding why we need machine learning
      2. Differentiating between machine learning and automation
      3. Machine learning applications
    2. Knowing the prerequisites
    3. Getting started with three types of machine learning
      1. A brief history of the development of machine learning algorithms
    4. Digging into the core of machine learning
      1. Generalizing with data
      2. Overfitting, underfitting, and the bias-variance trade-off
        1. Overfitting
        2. Underfitting
        3. The bias-variance trade-off
      3. Avoiding overfitting with cross-validation
      4. Avoiding overfitting with regularization
      5. Avoiding overfitting with feature selection and dimensionality reduction
    5. Data preprocessing and feature engineering
      1. Preprocessing and exploration
      2. Dealing with missing values
      3. Label encoding
      4. One-hot encoding
      5. Scaling
      6. Feature engineering
      7. Polynomial transformation
      8. Power transforms
      9. Binning
    6. Combining models
      1. Voting and averaging
      2. Bagging
      3. Boosting
      4. Stacking
    7. Installing software and setting up
      1. Setting up Python and environments
      2. Installing the main Python packages
        1. NumPy
        2. SciPy
        3. Pandas
        4. Scikit-learn
        5. TensorFlow
      3. Introducing TensorFlow 2
    8. Summary
    9. Exercises
  3. Building a Movie Recommendation Engine with Naïve Bayes
    1. Getting started with classification
      1. Binary classification
      2. Multiclass classification
      3. Multi-label classification
    2. Exploring Naïve Bayes
      1. Learning Bayes' theorem by example
      2. The mechanics of Naïve Bayes
    3. Implementing Naïve Bayes
      1. Implementing Naïve Bayes from scratch
      2. Implementing Naïve Bayes with scikit-learn
    4. Building a movie recommender with Naïve Bayes
    5. Evaluating classification performance
    6. Tuning models with cross-validation
    7. Summary
    8. Exercise
    9. References
  4. Recognizing Faces with Support Vector Machine
    1. Finding the separating boundary with SVM
      1. Scenario 1 – identifying a separating hyperplane
      2. Scenario 2 – determining the optimal hyperplane
      3. Scenario 3 – handling outliers
      4. Implementing SVM
      5. Scenario 4 – dealing with more than two classes
      6. Scenario 5 – solving linearly non-separable problems with kernels
      7. Choosing between linear and RBF kernels
    2. Classifying face images with SVM
      1. Exploring the face image dataset
      2. Building an SVM-based image classifier
      3. Boosting image classification performance with PCA
    3. Fetal state classification on cardiotocography
    4. Summary
    5. Exercises
  5. Predicting Online Ad Click-Through with Tree-Based Algorithms
    1. A brief overview of ad click-through prediction
    2. Getting started with two types of data – numerical and categorical
    3. Exploring a decision tree from the root to the leaves
      1. Constructing a decision tree
      2. The metrics for measuring a split
        1. Gini Impurity
        2. Information Gain
    4. Implementing a decision tree from scratch
    5. Implementing a decision tree with scikit-learn
    6. Predicting ad click-through with a decision tree
    7. Ensembling decision trees – random forest
    8. Ensembling decision trees – gradient boosted trees
    9. Summary
    10. Exercises
  6. Predicting Online Ad Click-Through with Logistic Regression
    1. Converting categorical features to numerical—one-hot encoding and ordinal encoding
    2. Classifying data with logistic regression
      1. Getting started with the logistic function
      2. Jumping from the logistic function to logistic regression
    3. Training a logistic regression model
      1. Training a logistic regression model using gradient descent
      2. Predicting ad click-through with logistic regression using gradient descent
      3. Training a logistic regression model using stochastic gradient descent
      4. Training a logistic regression model with regularization
      5. Feature selection using L1 regularization
    4. Training on large datasets with online learning
    5. Handling multiclass classification
    6. Implementing logistic regression using TensorFlow
    7. Feature selection using random forest
    8. Summary
    9. Exercises
  7. Scaling Up Prediction to Terabyte Click Logs
    1. Learning the essentials of Apache Spark
      1. Breaking down Spark
      2. Installing Spark
      3. Launching and deploying Spark programs
    2. Programming in PySpark
    3. Learning on massive click logs with Spark
      1. Loading click logs
      2. Splitting and caching the data
      3. One-hot encoding categorical features
      4. Training and testing a logistic regression model
    4. Feature engineering on categorical variables with Spark
      1. Hashing categorical features
      2. Combining multiple variables – feature interaction
    5. Summary
    6. Exercises
  8. Predicting Stock Prices with Regression Algorithms
    1. A brief overview of the stock market and stock prices
    2. What is regression?
    3. Mining stock price data
      1. Getting started with feature engineering
      2. Acquiring data and generating features
    4. Estimating with linear regression
      1. How does linear regression work?
      2. Implementing linear regression from scratch
      3. Implementing linear regression with scikit-learn
      4. Implementing linear regression with TensorFlow
    5. Estimating with decision tree regression
      1. Transitioning from classification trees to regression trees
      2. Implementing decision tree regression
      3. Implementing a regression forest
    6. Estimating with support vector regression
      1. Implementing SVR
    7. Evaluating regression performance
    8. Predicting stock prices with the three regression algorithms
    9. Summary
    10. Exercises
  9. Predicting Stock Prices with Artificial Neural Networks
    1. Demystifying neural networks
      1. Starting with a single-layer neural network
        1. Layers in neural networks
      2. Activation functions
      3. Backpropagation
      4. Adding more layers to a neural network: DL
    2. Building neural networks
      1. Implementing neural networks from scratch
      2. Implementing neural networks with scikit-learn
      3. Implementing neural networks with TensorFlow
    3. Picking the right activation functions
    4. Preventing overfitting in neural networks
      1. Dropout
      2. Early stopping
    5. Predicting stock prices with neural networks
      1. Training a simple neural network
      2. Fine-tuning the neural network
    6. Summary
    7. Exercise
  10. Mining the 20 Newsgroups Dataset with Text Analysis Techniques
    1. How computers understand language – NLP
      1. What is NLP?
      2. The history of NLP
      3. NLP applications
    2. Touring popular NLP libraries and picking up NLP basics
      1. Installing famous NLP libraries
      2. Corpora
      3. Tokenization
      4. PoS tagging
      5. NER
      6. Stemming and lemmatization
      7. Semantics and topic modeling
    3. Getting the newsgroups data
    4. Exploring the newsgroups data
    5. Thinking about features for text data
      1. Counting the occurrence of each word token
      2. Text preprocessing
      3. Dropping stop words
      4. Reducing inflectional and derivational forms of words
    6. Visualizing the newsgroups data with t-SNE
      1. What is dimensionality reduction?
      2. t-SNE for dimensionality reduction
    7. Summary
    8. Exercises
  11. Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling
    1. Learning without guidance – unsupervised learning
    2. Clustering newsgroups data using k-means
      1. How does k-means clustering work?
      2. Implementing k-means from scratch
      3. Implementing k-means with scikit-learn
      4. Choosing the value of k
      5. Clustering newsgroups data using k-means
    3. Discovering underlying topics in newsgroups
      1. Topic modeling using NMF
      2. Topic modeling using LDA
    4. Summary
    5. Exercises
  12. Machine Learning Best Practices
    1. Machine learning solution workflow
    2. Best practices in the data preparation stage
      1. Best practice 1 – Completely understanding the project goal
      2. Best practice 2 – Collecting all fields that are relevant
      3. Best practice 3 – Maintaining the consistency of field values
      4. Best practice 4 – Dealing with missing data
      5. Best practice 5 – Storing large-scale data
    3. Best practices in the training sets generation stage
      1. Best practice 6 – Identifying categorical features with numerical values
      2. Best practice 7 – Deciding whether to encode categorical features
      3. Best practice 8 – Deciding whether to select features, and if so, how to do so
      4. Best practice 9 – Deciding whether to reduce dimensionality, and if so, how to do so
      5. Best practice 10 – Deciding whether to rescale features
      6. Best practice 11 – Performing feature engineering with domain expertise
      7. Best practice 12 – Performing feature engineering without domain expertise
      8. Binarization
      9. Discretization
      10. Interaction
      11. Polynomial transformation
      12. Best practice 13 – Documenting how each feature is generated
      13. Best practice 14 – Extracting features from text data
      14. Tf and tf-idf
      15. Word embedding
      16. Word embedding with pre-trained models
    4. Best practices in the model training, evaluation, and selection stage
      1. Best practice 15 – Choosing the right algorithm(s) to start with
        1. Naïve Bayes
        2. Logistic regression
        3. SVM
        4. Random forest (or decision tree)
        5. Neural networks
      2. Best practice 16 – Reducing overfitting
      3. Best practice 17 – Diagnosing overfitting and underfitting
      4. Best practice 18 – Modeling on large-scale datasets
    5. Best practices in the deployment and monitoring stage
      1. Best practice 19 – Saving, loading, and reusing models
      2. Saving and restoring models using pickle
      3. Saving and restoring models in TensorFlow
      4. Best practice 20 – Monitoring model performance
      5. Best practice 21 – Updating models regularly
    6. Summary
    7. Exercises
  13. Categorizing Images of Clothing with Convolutional Neural Networks
    1. Getting started with CNN building blocks
      1. The convolutional layer
      2. The nonlinear layer
      3. The pooling layer
    2. Architecting a CNN for classification
    3. Exploring the clothing image dataset
    4. Classifying clothing images with CNNs
      1. Architecting the CNN model
      2. Fitting the CNN model
      3. Visualizing the convolutional filters
    5. Boosting the CNN classifier with data augmentation
      1. Horizontal flipping for data augmentation
      2. Rotation for data augmentation
      3. Shifting for data augmentation
    6. Improving the clothing image classifier with data augmentation
    7. Summary
    8. Exercises
  14. Making Predictions with Sequences Using Recurrent Neural Networks
    1. Introducing sequential learning
    2. Learning the RNN architecture by example
      1. Recurrent mechanism
      2. Many-to-one RNNs
      3. One-to-many RNNs
      4. Many-to-many (synced) RNNs
      5. Many-to-many (unsynced) RNNs
    3. Training an RNN model
    4. Overcoming long-term dependencies with Long Short-Term Memory
    5. Analyzing movie review sentiment with RNNs
      1. Analyzing and preprocessing the data
      2. Building a simple LSTM network
      3. Stacking multiple LSTM layers
    6. Writing your own War and Peace with RNNs
      1. Acquiring and analyzing the training data
      2. Constructing the training set for the RNN text generator
      3. Building an RNN text generator
      4. Training the RNN text generator
    7. Advancing language understanding with the Transformer model
      1. Exploring the Transformer's architecture
      2. Understanding self-attention
    8. Summary
    9. Exercises
  15. Making Decisions in Complex Environments with Reinforcement Learning
    1. Setting up the working environment
      1. Installing PyTorch
      2. Installing OpenAI Gym
    2. Introducing reinforcement learning with examples
      1. Elements of reinforcement learning
      2. Cumulative rewards
      3. Approaches to reinforcement learning
    3. Solving the FrozenLake environment with dynamic programming
      1. Simulating the FrozenLake environment
      2. Solving FrozenLake with the value iteration algorithm
      3. Solving FrozenLake with the policy iteration algorithm
    4. Performing Monte Carlo learning
      1. Simulating the Blackjack environment
      2. Performing Monte Carlo policy evaluation
      3. Performing on-policy Monte Carlo control
    5. Solving the Taxi problem with the Q-learning algorithm
      1. Simulating the Taxi environment
      2. Developing the Q-learning algorithm
    6. Summary
    7. Exercises
  16. Other Books You May Enjoy
  17. Index

Product information

  • Title: Python Machine Learning By Example - Third Edition
  • Author(s): Yuxi Liu
  • Release date: October 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781800209718