Practical Machine Learning for Computer Vision

Book description

This practical book shows you how to employ machine learning models to extract information from images. ML engineers and data scientists will learn how to solve a variety of image problems including classification, object detection, autoencoders, image generation, counting, and captioning with proven ML techniques. This book provides a great introduction to end-to-end deep learning: dataset creation, data preprocessing, model design, model training, evaluation, deployment, and interpretability.

Google engineers Valliappa Lakshmanan, Martin Görner, and Ryan Gillard show you how to develop accurate and explainable computer vision ML models and put them into large-scale production using robust ML architecture in a flexible and maintainable way. You'll learn how to design, train, evaluate, and predict with models written in TensorFlow or Keras.

You'll learn how to:

  • Design ML architecture for computer vision tasks
  • Select a model (such as ResNet, SqueezeNet, or EfficientNet) appropriate to your task
  • Create an end-to-end ML pipeline to train, evaluate, deploy, and explain your model
  • Preprocess images for data augmentation and to support learnability
  • Incorporate explainability and responsible AI best practices
  • Deploy image models as web services or on edge devices
  • Monitor and manage ML models

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Who Is This Book For?
    2. How to Use This Book
    3. Organization of the Book
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Online Learning
    7. How to Contact Us
    8. Acknowledgments
  2. 1. Machine Learning for Computer Vision
    1. Machine Learning
    2. Deep Learning Use Cases
    3. Summary
  3. 2. ML Models for Vision
    1. A Dataset for Machine Perception
      1. 5-Flowers Dataset
      2. Reading Image Data
      3. Visualizing Image Data
      4. Reading the Dataset File
    2. A Linear Model Using Keras
      1. Keras Model
      2. Training the Model
    3. A Neural Network Using Keras
      1. Neural Networks
      2. Deep Neural Networks
    4. Summary
    5. Glossary
  4. 3. Image Vision
    1. Pretrained Embeddings
      1. Pretrained Model
      2. Transfer Learning
      3. Fine-Tuning
    2. Convolutional Networks
      1. Convolutional Filters
      2. Stacking Convolutional Layers
      3. Pooling Layers
      4. AlexNet
    3. The Quest for Depth
      1. Filter Factorization
      2. 1x1 Convolutions
      3. VGG19
      4. Global Average Pooling
    4. Modular Architectures
      1. Inception
      2. SqueezeNet
      3. ResNet and Skip Connections
      4. DenseNet
      5. Depth-Separable Convolutions
      6. Xception
    5. Neural Architecture Search Designs
      1. NASNet
      2. The MobileNet Family
    6. Beyond Convolution: The Transformer Architecture
    7. Choosing a Model
      1. Performance Comparison
      2. Ensembling
      3. Recommended Strategy
    8. Summary
  5. 4. Object Detection and Image Segmentation
    1. Object Detection
      1. YOLO
      2. RetinaNet
    2. Segmentation
      1. Mask R-CNN and Instance Segmentation
      2. U-Net and Semantic Segmentation
    3. Summary
  6. 5. Creating Vision Datasets
    1. Collecting Images
      1. Photographs
      2. Imaging
      3. Proof of Concept
    2. Data Types
      1. Channels
      2. Geospatial Data
      3. Audio and Video
    3. Manual Labeling
      1. Multilabel
      2. Object Detection
    4. Labeling at Scale
      1. Labeling User Interface
      2. Multiple Tasks
      3. Voting and Crowdsourcing
      4. Labeling Services
    5. Automated Labeling
      1. Labels from Related Data
      2. Noisy Student
      3. Self-Supervised Learning
    6. Bias
      1. Sources of Bias
      2. Selection Bias
      3. Measurement Bias
      4. Confirmation Bias
      5. Detecting Bias
    7. Creating a Dataset
      1. Splitting Data
      2. TensorFlow Records
      3. Reading TensorFlow Records
    8. Summary
  7. 6. Preprocessing
    1. Reasons for Preprocessing
      1. Shape Transformation
      2. Data Quality Transformation
      3. Improving Model Quality
    2. Size and Resolution
      1. Using Keras Preprocessing Layers
      2. Using the TensorFlow Image Module
      3. Mixing Keras and TensorFlow
      4. Model Training
    3. Training-Serving Skew
      1. Reusing Functions
      2. Preprocessing Within the Model
      3. Using tf.transform
    4. Data Augmentation
      1. Spatial Transformations
      2. Color Distortion
      3. Information Dropping
    5. Forming Input Images
    6. Summary
  8. 7. Training Pipeline
    1. Efficient Ingestion
      1. Storing Data Efficiently
      2. Reading Data in Parallel
      3. Maximizing GPU Utilization
    2. Saving Model State
      1. Exporting the Model
      2. Checkpointing
    3. Distribution Strategy
      1. Choosing a Strategy
      2. Creating the Strategy
    4. Serverless ML
      1. Creating a Python Package
      2. Submitting a Training Job
      3. Hyperparameter Tuning
      4. Deploying the Model
    5. Summary
  9. 8. Model Quality and Continuous Evaluation
    1. Monitoring
      1. TensorBoard
      2. Weight Histograms
      3. Device Placement
      4. Data Visualization
      5. Training Events
    2. Model Quality Metrics
      1. Metrics for Classification
      2. Metrics for Regression
      3. Metrics for Object Detection
    3. Quality Evaluation
      1. Sliced Evaluations
      2. Fairness Monitoring
      3. Continuous Evaluation
    4. Summary
  10. 9. Model Predictions
    1. Making Predictions
      1. Exporting the Model
      2. Using In-Memory Models
      3. Improving Abstraction
      4. Improving Efficiency
    2. Online Prediction
      1. TensorFlow Serving
      2. Modifying the Serving Function
      3. Handling Image Bytes
    3. Batch and Stream Prediction
      1. The Apache Beam Pipeline
      2. Managed Service for Batch Prediction
      3. Invoking Online Prediction
    4. Edge ML
      1. Constraints and Optimizations
      2. TensorFlow Lite
      3. Running TensorFlow Lite
      4. Processing the Image Buffer
      5. Federated Learning
    5. Summary
  11. 10. Trends in Production ML
    1. Machine Learning Pipelines
      1. The Need for Pipelines
      2. Kubeflow Pipelines Cluster
      3. Containerizing the Codebase
      4. Writing a Component
      5. Connecting Components
      6. Automating a Run
    2. Explainability
      1. Techniques
      2. Adding Explainability
    3. No-Code Computer Vision
      1. Why Use No-Code?
      2. Loading Data
      3. Training
      4. Evaluation
    4. Summary
  12. 11. Advanced Vision Problems
    1. Object Measurement
      1. Reference Object
      2. Segmentation
      3. Rotation Correction
      4. Ratio and Measurements
    2. Counting
      1. Density Estimation
      2. Extracting Patches
      3. Simulating Input Images
      4. Regression
      5. Prediction
    3. Pose Estimation
      1. PersonLab
      2. The PoseNet Model
      3. Identifying Multiple Poses
    4. Image Search
      1. Distributed Search
      2. Fast Search
      3. Better Embeddings
    5. Summary
  13. 12. Image and Text Generation
    1. Image Understanding
      1. Embeddings
      2. Auxiliary Learning Tasks
      3. Autoencoders
      4. Variational Autoencoders
    2. Image Generation
      1. Generative Adversarial Networks
      2. GAN Improvements
      3. Image-to-Image Translation
      4. Super-Resolution
      5. Modifying Pictures (Inpainting)
      6. Anomaly Detection
      7. Deepfakes
    3. Image Captioning
      1. Dataset
      2. Tokenizing the Captions
      3. Batching
      4. Captioning Model
      5. Training Loop
      6. Prediction
    4. Summary
  14. Afterword
  15. Index

Product information

  • Title: Practical Machine Learning for Computer Vision
  • Author(s): Valliappa Lakshmanan, Martin Görner, Ryan Gillard
  • Release date: July 2021
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098102364