Machine Learning Bookcamp

Book description

Machine Learning Bookcamp presents realistic, practical machine learning scenarios, along with crystal-clear coverage of key concepts. In it, you’ll complete engaging projects, such as creating a car price predictor using linear regression and deploying a churn prediction service. You’ll go beyond the algorithms and explore important techniques like deploying ML applications on serverless systems and serving models with Kubernetes and Kubeflow. Dig in, get your hands dirty, and have fun building your ML skills!

Table of contents

  1. inside front cover
  2. Machine Learning Bookcamp
  3. Copyright
  4. brief contents
  5. contents
  6. front matter
    1. foreword
    2. preface
    3. acknowledgments
    4. about this book
      1. Who should read this book
      2. How this book is organized: a roadmap
      3. About the code
      4. liveBook discussion forum
      5. Other online resources
    5. about the author
    6. about the cover illustration
  7. 1 Introduction to machine learning
    1. 1.1 Machine learning
      1. 1.1.1 Machine learning vs. rule-based systems
      2. 1.1.2 When machine learning isn’t helpful
      3. 1.1.3 Supervised machine learning
    2. 1.2 Machine learning process
      1. 1.2.1 Business understanding
      2. 1.2.2 Data understanding
      3. 1.2.3 Data preparation
      4. 1.2.4 Modeling
      5. 1.2.5 Evaluation
      6. 1.2.6 Deployment
      7. 1.2.7 Iterate
    3. 1.3 Modeling and model validation
    4. Summary
  8. 2 Machine learning for regression
    1. 2.1 Car-price prediction project
      1. 2.1.1 Downloading the dataset
    2. 2.2 Exploratory data analysis
      1. 2.2.1 Exploratory data analysis toolbox
      2. 2.2.2 Reading and preparing data
      3. 2.2.3 Target variable analysis
      4. 2.2.4 Checking for missing values
      5. 2.2.5 Validation framework
    3. 2.3 Machine learning for regression
      1. 2.3.1 Linear regression
      2. 2.3.2 Training linear regression model
    4. 2.4 Predicting the price
      1. 2.4.1 Baseline solution
      2. 2.4.2 RMSE: Evaluating model quality
      3. 2.4.3 Validating the model
      4. 2.4.4 Simple feature engineering
      5. 2.4.5 Handling categorical variables
      6. 2.4.6 Regularization
      7. 2.4.7 Using the model
    5. 2.5 Next steps
      1. 2.5.1 Exercises
      2. 2.5.2 Other projects
    6. Summary
    7. Answers to exercises
  9. 3 Machine learning for classification
    1. 3.1 Churn prediction project
      1. 3.1.1 Telco churn dataset
      2. 3.1.2 Initial data preparation
      3. 3.1.3 Exploratory data analysis
      4. 3.1.4 Feature importance
    2. 3.2 Feature engineering
      1. 3.2.1 One-hot encoding for categorical variables
    3. 3.3 Machine learning for classification
      1. 3.3.1 Logistic regression
      2. 3.3.2 Training logistic regression
      3. 3.3.3 Model interpretation
      4. 3.3.4 Using the model
    4. 3.4 Next steps
      1. 3.4.1 Exercises
      2. 3.4.2 Other projects
    5. Summary
    6. Answers to exercises
  10. 4 Evaluation metrics for classification
    1. 4.1 Evaluation metrics
      1. 4.1.1 Classification accuracy
      2. 4.1.2 Dummy baseline
    2. 4.2 Confusion table
      1. 4.2.1 Introduction to the confusion table
      2. 4.2.2 Calculating the confusion table with NumPy
      3. 4.2.3 Precision and recall
    3. 4.3 ROC curve and AUC score
      1. 4.3.1 True positive rate and false positive rate
      2. 4.3.2 Evaluating a model at multiple thresholds
      3. 4.3.3 Random baseline model
      4. 4.3.4 The ideal model
      5. 4.3.5 ROC Curve
      6. 4.3.6 Area under the ROC curve (AUC)
    4. 4.4 Parameter tuning
      1. 4.4.1 K-fold cross-validation
      2. 4.4.2 Finding best parameters
    5. 4.5 Next steps
      1. 4.5.1 Exercises
      2. 4.5.2 Other projects
    6. Summary
    7. Answers to exercises
  11. 5 Deploying machine learning models
    1. 5.1 Churn-prediction model
      1. 5.1.1 Using the model
      2. 5.1.2 Using Pickle to save and load the model
    2. 5.2 Model serving
      1. 5.2.1 Web services
      2. 5.2.2 Flask
      3. 5.2.3 Serving churn model with Flask
    3. 5.3 Managing dependencies
      1. 5.3.1 Pipenv
      2. 5.3.2 Docker
    4. 5.4 Deployment
      1. 5.4.1 AWS Elastic Beanstalk
    5. 5.5 Next steps
      1. 5.5.1 Exercises
      2. 5.5.2 Other projects
    6. Summary
  12. 6 Decision trees and ensemble learning
    1. 6.1 Credit risk scoring project
      1. 6.1.1 Credit scoring dataset
      2. 6.1.2 Data cleaning
      3. 6.1.3 Dataset preparation
    2. 6.2 Decision trees
      1. 6.2.1 Decision tree classifier
      2. 6.2.2Decision tree learning algorithm
      3. 6.2.3 Parameter tuning for decision tree
    3. 6.3 Random forest
      1. 6.3.1 Training a random forest
      2. 6.3.2 Parameter tuning for random forest
    4. 6.4 Gradient boosting
      1. 6.4.1 XGBoost: Extreme gradient boosting
      2. 6.4.2 Model performance monitoring
      3. 6.4.3 Parameter tuning for XGBoost
      4. 6.4.4 Testing the final model
    5. 6.5 Next steps
      1. 6.5.1 Exercises
      2. 6.5.2 Other projects
    6. Summary
    7. Answers to exercises
  13. 7 Neural networks and deep learning
    1. 7.1 Fashion classification
      1. 7.1.1 GPU vs. CPU
      2. 7.1.2 Downloading the clothing dataset
      3. 7.1.3 TensorFlow and Keras
      4. 7.1.4 images
    2. 7.2 Convolutional neural networks
      1. 7.2.1 Using a pretrained model
      2. 7.2.2 Getting predictions
    3. 7.3 Internals of the model
      1. 7.3.1 Convolutional layers
      2. 7.3.2 Dense layers
    4. 7.4 Training the model
      1. 7.4.1 Transfer learning
      2. 7.4.2 Loading the data
      3. 7.4.3 Creating the model
      4. 7.4.4 Training the model
      5. 7.4.5 Adjusting the learning rate
      6. 7.4.6 Saving the model and checkpointing
      7. 7.4.7 Adding more layers
      8. 7.4.8 Regularization and dropout
      9. 7.4.9 Data augmentation
      10. 7.4.10 Training a larger model
    5. 7.5 Using the model
      1. 7.5.1 Loading the model
      2. 7.5.2 Evaluating the model
      3. 7.5.3 Getting the predictions
    6. 7.6 Next steps
      1. 7.6.1 Exercises
      2. 7.6.2 Other projects
    7. Summary
    8. Answers to exercises
  14. 8 Serverless deep learning
    1. 8.1 Serverless: AWS Lambda
      1. 8.1.1 TensorFlow Lite
      2. 8.1.2 Converting the model to TF Lite format
      3. 8.1.3 Preparing the images
      4. 8.1.4 Using the TensorFlow Lite model
      5. 8.1.5 Code for the lambda function
      6. 8.1.6 Preparing the Docker image
      7. 8.1.7 Pushing the image to AWS ECR
      8. 8.1.8 Creating the lambda function
      9. 8.1.9 Creating the API Gateway
    2. 8.2 Next steps
      1. 8.2.1 Exercises
      2. 8.2.2 Other projects
    3. Summary
  15. 9 Serving models with Kubernetes and Kubeflow
    1. 9.1 Kubernetes and Kubeflow
    2. 9.2 Serving models with TensorFlow Serving
      1. 9.2.1 Overview of the serving architecture
      2. 9.2.2 The saved_model format
      3. 9.2.3 Running TensorFlow Serving locally
      4. 9.2.4 Invoking the TF Serving model from Jupyter
      5. 9.2.5 Creating the Gateway service
    3. 9.3 Model deployment with Kubernetes
      1. 9.3.1 Introduction to Kubernetes
      2. 9.3.2 Creating a Kubernetes cluster on AWS
      3. 9.3.3 Preparing the Docker images
      4. 9.3.4 Deploying to Kubernetes
      5. 9.3.5 Testing the service
    4. 9.4 Model deployment with Kubeflow
      1. 9.4.1 the model: Uploading it to S3
      2. 9.4.2 Deploying TensorFlow models with KFServing
      3. 9.4.3 Accessing the model
      4. 9.4.4 KFServing transformers
      5. 9.4.5 Testing the transformer
      6. 9.4.6 Deleting the EKS cluster
    5. 9.5 Next steps
      1. 9.5.1 Exercises
      2. 9.5.2 Other projects
    6. Summary
  16. Appendix A. Preparing the environment
    1. A.1 Installing Python and Anaconda
      1. A.1.1 Installing Python and Anaconda on Linux
      2. A.1.2 Installing Python and Anaconda on Windows
      3. A.1.3 Installing Python and Anaconda on macOS
    2. A.2 Running Jupyter
      1. A.2.1 Running Jupyter on Linux
      2. A.2.2 Running Jupyter on Windows
      3. A.2.3 Running Jupyter on MacOS
    3. A.3 Installing the Kaggle CLI
    4. A.4 Accessing the source code
    5. A.5 Installing Docker
      1. A.5.1 Installing Docker on Linux
      2. A.5.2 Installing Docker on Windows
      3. A.5.3 Installing Docker on MacOS
    6. A.6 Renting a server on AWS
      1. A.6.1 Registering on AWS
      2. A.6.2 Accessing billing information
      3. A.6.3 Creating an EC2 instance
      4. A.6.4 Connecting to the instance
      5. A.6.5 Shutting down the instance
      6. A.6.6 Configuring AWS CLI
  17. Appendix B. Introduction to Python
    1. B.1 Variables
      1. B.1.1 Control flow
      2. B.1.2 Collections
      3. B.1.3 Code reusability
      4. B.1.4 Installing libraries
      5. B.1.5 Python programs
  18. Appendix C. Introduction to NumPy
    1. C.1 NumPy
      1. C.1.1 NumPy arrays
      2. C.1.2 Two-dimensional NumPy arrays
      3. C.1.3 Randomly generated arrays
    2. C.2 NumPy operations
      1. C.2.1 Element-wise operations
      2. C.2.2 Summarizing operations
      3. C.2.3 Sorting
      4. C.2.4 Reshaping and combining
      5. C.2.5 Slicing and filtering
    3. C.3 Linear algebra
      1. C.3.1 Multiplication
      2. C.3.2 Matrix inverse
      3. C.3.3 Normal equation
  19. Appendix D. Introduction to Pandas
    1. D.1 Pandas
      1. D.1.1 DataFrame
      2. D.1.2 Series
      3. D.1.3 Index
      4. D.1.4 Accessing rows
      5. D.1.5 Splitting a DataFrame
    2. D.2 Operations
      1. D.2.1 Element-wise operations
      2. D.2.2 Filtering
      3. D.2.3 String operations
      4. D.2.4 Summarizing operations
      5. D.2.5 Missing values
      6. D.2.6 Sorting
      7. D.2.7 Grouping
  20. Appendix E. AWS SageMaker
    1. E.1 AWS SageMaker Notebooks
      1. E.1.1 Increasing the GPU quota limits
      2. E.1.2 Creating a notebook instance
      3. E.1.3 Training a model
      4. E.1.4 Turning off the notebook
  21. index

Product information

  • Title: Machine Learning Bookcamp
  • Author(s): Alexey Grigoriev
  • Release date: October 2021
  • Publisher(s): Manning Publications
  • ISBN: 9781617296819