Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Book description

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems

Key Features

  • Delve into machine learning with this comprehensive guide to scikit-learn and scientific Python
  • Master the art of data-driven problem-solving with hands-on examples
  • Foster your theoretical and practical knowledge of supervised and unsupervised machine learning algorithms

Book Description

Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits.

The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You'll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it. As you advance, you'll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms.

By the end of this machine learning book, you'll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You'll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production.

What you will learn

  • Understand when to use supervised, unsupervised, or reinforcement learning algorithms
  • Find out how to collect and prepare your data for machine learning tasks
  • Tackle imbalanced data and optimize your algorithm for a bias or variance tradeoff
  • Apply supervised and unsupervised algorithms to overcome various machine learning challenges
  • Employ best practices for tuning your algorithm's hyper parameters
  • Discover how to use neural networks for classification and regression
  • Build, evaluate, and deploy your machine learning solutions to production

Who this book is for

This book is for data scientists, machine learning practitioners, and anyone who wants to learn how machine learning algorithms work and to build different machine learning models using the Python ecosystem. The book will help you take your knowledge of machine learning to the next level by grasping its ins and outs and tailoring it to your needs. Working knowledge of Python and a basic understanding of underlying mathematical and statistical concepts is required.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Section 1: Supervised Learning
  7. Introduction to Machine Learning
    1. Understanding machine learning
      1. Types of machine learning algorithms
        1. Supervised learning
          1. Classification versus regression
          2. Supervised learning evaluation
        2. Unsupervised learning
        3. Reinforcement learning
    2. The model development life cycle
      1. Understanding a problem
      2. Splitting our data
        1. Finding the best manner to split the data
        2. Making sure the training and the test datasets are separate
        3. Development set
      3. Evaluating our model
      4. Deploying in production and monitoring
      5. Iterating
      6. When to use machine learning
    3. Introduction to scikit-learn
      1. It plays well with the Python data ecosystem
      2. Practical level of abstraction
      3. When not to use scikit-learn
    4. Installing the packages you need
      1. Introduction to pandas
        1. Python's scientific computing ecosystem conventions
    5. Summary
    6. Further reading
  8. Making Decisions with Trees
    1. Understanding decision trees
      1. What are decision trees?
      2. Iris classification
        1. Loading the Iris dataset
        2. Splitting the data
        3. Training the model and using it for prediction
        4. Evaluating our predictions
        5. Which features were more important?
        6. Displaying the internal tree decisions
    2. How do decision trees learn?
      1. Splitting criteria
      2. Preventing overfitting
      3. Predictions
    3. Getting a more reliable score
      1. What to do now to get a more reliable score
      2. ShuffleSplit
    4. Tuning the hyperparameters for higher accuracy
      1. Splitting the data
      2. Trying different hyperparameter values
      3. Comparing the accuracy scores
    5. Visualizing the tree's decision boundaries
      1. Feature engineering
    6. Building decision tree regressors
      1. Predicting people's heights
      2. Regressor's evaluation
      3. Setting sample weights
    7. Summary
  9. Making Decisions with Linear Equations
    1. Understanding linear models
      1. Linear equations
      2. Linear regression
        1. Estimating the amount paid to the taxi driver
    2. Predicting house prices in Boston
      1. Data exploration
      2. Splitting the data
      3. Calculating a baseline
      4. Training the linear regressor
      5. Evaluating our model's accuracy
      6. Showing feature coefficients
      7. Scaling for more meaningful coefficients
      8. Adding polynomial features
        1. Fitting the linear regressor with the derived features
    3. Regularizing the regressor
      1. Training the lasso regressor
      2. Finding the optimum regularization parameter
    4. Finding regression intervals
    5. Getting to know additional linear regressors
    6. Using logistic regression for classification
      1. Understanding the logistic function
      2. Plugging the logistic function into a linear model
        1. Objective function
        2. Regularization
        3. Solvers
      3. Configuring the logistic regression classifier
      4. Classifying the Iris dataset using logistic regression
      5. Understanding the classifier's decision boundaries
    7. Getting to know additional linear classifiers
    8. Summary
  10. Preparing Your Data
    1. Imputing missing values
      1. Setting missing values to 0
      2. Setting missing values to the mean
      3. Using informed estimations for missing values
    2. Encoding non-numerical columns
      1. One-hot encoding
      2. Ordinal encoding
      3. Target encoding
    3. Homogenizing the columns' scale
      1. The standard scaler
      2. The MinMax scaler
      3. RobustScaler
    4. Selecting the most useful features
      1. VarianceThreshold
      2. Filters
        1. f-regression and f-classif
        2. Mutual information
        3. Comparing and using the different filters
      3. Evaluating multiple features at a time
    5. Summary
  11. Image Processing with Nearest Neighbors
    1. Nearest neighbors
    2. Loading and displaying images
    3. Image classification
      1. Using a confusion matrix to understand the model's mistakes
      2. Picking a suitable metric
      3. Setting the correct K
      4. Hyperparameter tuning using GridSearchCV
    4. Using custom distances
    5. Using nearest neighbors for regression
    6. More neighborhood algorithms
      1. Radius neighbors
      2. Nearest centroid classifier
    7. Reducing the dimensions of our image data
      1. Principal component analysis
      2. Neighborhood component analysis
      3. Comparing PCA to NCA
        1. Picking the most informative components
          1. Using the centroid classifier with PCA
        2. Restoring the original image from its components
      4. Finding the most informative pixels
    8. Summary
  12. Classifying Text Using Naive Bayes
    1. Splitting sentences into tokens
      1. Tokenizing with string split
      2. Tokenizing using regular expressions
      3. Using placeholders before tokenizing
    2. Vectorizing text into matrices
      1. Vector space model
        1. Bag of words
        2. Different sentences, same representation
        3. N-grams
        4. Using characters instead of words
        5. Capturing important words with TF-IDF
      2. Representing meanings with word embedding
        1. Word2Vec
    3. Understanding Naive Bayes
      1. The Bayes rule
      2. Calculating the likelihood naively
      3. Naive Bayes implementations
        1. Additive smoothing
    4. Classifying text using a Naive Bayes classifier
      1. Downloading the data
      2. Preparing the data
      3. Precision, recall, and F1 score
      4. Pipelines
        1. Optimizing for different scores
    5. Creating a custom transformer
    6. Summary
  13. Section 2: Advanced Supervised Learning
  14. Neural Networks – Here Comes Deep Learning
    1. Getting to know MLP
      1. Understanding the algorithm's architecture
      2. Training the neural network
        1. Configuring the solvers
    2. Classifying items of clothing
      1. Downloading the Fashion-MNIST dataset
      2. Preparing the data for classification
      3. Experiencing the effects of the hyperparameters
        1. Learning not too quickly and not too slowly
        2. Picking a suitable batch size
        3. Checking whether more training samples are needed
        4. Checking whether more epochs are needed
      4. Choosing the optimum architecture and hyperparameters
      5. Adding your own activation function
    3. Untangling the convolutions
      1. Extracting features by convolving
      2. Reducing the dimensionality of the data via max pooling
      3. Putting it all together
    4. MLP regressors
    5. Summary
  15. Ensembles – When One Model Is Not Enough
    1. Answering the question why ensembles?
      1. Combining multiple estimators via averaging
      2. Boosting multiple biased estimators
    2. Downloading the UCI Automobile dataset
      1. Dealing with missing values
      2. Differentiating between numerical features and categorical ones
      3. Splitting the data into training and test sets
      4. Imputing the missing values and encoding the categorical features
    3. Using random forest for regression
      1. Checking the effect of the number of trees
      2. Understanding the effect of each training feature
    4. Using random forest for classification
      1. The ROC curve
    5. Using bagging regressors
      1. Preparing a mixture of numerical and categorical features
      2. Combining KNN estimators using a bagging meta-estimator
    6. Using gradient boosting to predict automobile prices
      1. Plotting the learning deviance
      2. Comparing the learning rate settings
      3. Using different sample sizes
      4. Stopping earlier and adapting the learning rate
      5. Regression ranges
    7. Using AdaBoost ensembles
    8. Exploring more ensembles
      1. Voting ensembles
      2. Stacking ensembles
      3. Random tree embedding
    9. Summary
  16. The Y is as Important as the X
    1. Scaling your regression targets
    2. Estimating multiple regression targets
      1. Building a multi-output regressor
      2. Chaining multiple regressors
    3. Dealing with compound classification targets
      1. Converting a multi-class problem into a set of binary classifiers
      2. Estimating multiple classification targets
    4. Calibrating a classifier's probabilities
    5. Calculating the precision at k
    6. Summary
  17. Imbalanced Learning – Not Even 1% Win the Lottery
    1. Getting the click prediction dataset
    2. Installing the imbalanced-learn library
    3. Predicting the CTR
      1. Weighting the training samples differently
        1. The effect of the weighting on the ROC
    4. Sampling the training data
      1. Undersampling the majority class
      2. Oversampling the minority class
      3. Combining data sampling with ensembles
    5. Equal opportunity score
    6. Summary
  18. Section 3: Unsupervised Learning and More
  19. Clustering – Making Sense of Unlabeled Data
    1. Understanding clustering
    2. K-means clustering
      1. Creating a blob-shaped dataset
      2. Visualizing our sample data
      3. Clustering with K-means
      4. The silhouette score
      5. Choosing the initial centroids
    3. Agglomerative clustering
      1. Tracing the agglomerative clustering's children
      2. The adjusted Rand index
      3. Choosing the cluster linkage
    4. DBSCAN
    5. Summary
  20. Anomaly Detection – Finding Outliers in Data
    1. Unlabeled anomaly detection
      1. Generating sample data
    2. Detecting anomalies using basic statistics
      1. Using percentiles for multi-dimensional data
    3. Detecting outliers using EllipticEnvelope
    4. Outlier and novelty detection using LOF
      1. Novelty detection using LOF
    5. Detecting outliers using isolation forest
    6. Summary
  21. Recommender System – Getting to Know Their Taste
    1. The different recommendation paradigms
    2. Downloading surprise and the dataset
      1. Downloading the KDD Cup 2012 dataset
      2. Processing and splitting the dataset
      3. Creating a random recommender
    3. Using KNN-inspired algorithms
    4. Using baseline algorithms
    5. Using singular value decomposition
      1. Extracting latent information via SVD
        1. Comparing the similarity measures for the two matrices
      2. Click prediction using SVD
    6. Deploying machine learning models in production
    7. Summary
  22. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
  • Author(s): Tarek Amr
  • Release date: July 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781838826048