book

Practical Machine Learning for Computer Vision

by Valliappa Lakshmanan, Martin Görner, Ryan Gillard

July 2021

Intermediate to advanced

480 pages

12h 44m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who Is This Book For?How to Use This BookOrganization of the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Machine Learning for Computer Vision
Machine LearningDeep Learning Use CasesSummary
2. ML Models for Vision
A Dataset for Machine Perception5-Flowers DatasetReading Image DataVisualizing Image DataReading the Dataset FileA Linear Model Using KerasKeras ModelTraining the ModelA Neural Network Using KerasNeural NetworksDeep Neural NetworksSummaryGlossary
3. Image Vision
Pretrained EmbeddingsPretrained ModelTransfer LearningFine-TuningConvolutional NetworksConvolutional FiltersStacking Convolutional LayersPooling LayersAlexNetThe Quest for DepthFilter Factorization1x1 ConvolutionsVGG19Global Average PoolingModular ArchitecturesInceptionSqueezeNetResNet and Skip ConnectionsDenseNetDepth-Separable ConvolutionsXceptionNeural Architecture Search DesignsNASNetThe MobileNet FamilyBeyond Convolution: The Transformer ArchitectureChoosing a ModelPerformance ComparisonEnsemblingRecommended StrategySummary
4. Object Detection and Image Segmentation
Object DetectionYOLORetinaNetSegmentationMask R-CNN and Instance SegmentationU-Net and Semantic SegmentationSummary
5. Creating Vision Datasets
Collecting ImagesPhotographsImagingProof of ConceptData TypesChannelsGeospatial DataAudio and VideoManual LabelingMultilabelObject DetectionLabeling at ScaleLabeling User InterfaceMultiple TasksVoting and CrowdsourcingLabeling ServicesAutomated LabelingLabels from Related DataNoisy StudentSelf-Supervised LearningBiasSources of BiasSelection BiasMeasurement BiasConfirmation BiasDetecting BiasCreating a DatasetSplitting DataTensorFlow RecordsReading TensorFlow RecordsSummary
6. Preprocessing
Reasons for PreprocessingShape TransformationData Quality TransformationImproving Model QualitySize and ResolutionUsing Keras Preprocessing LayersUsing the TensorFlow Image ModuleMixing Keras and TensorFlowModel TrainingTraining-Serving SkewReusing FunctionsPreprocessing Within the ModelUsing tf.transformData AugmentationSpatial TransformationsColor DistortionInformation DroppingForming Input ImagesSummary
7. Training Pipeline
Efficient IngestionStoring Data EfficientlyReading Data in ParallelMaximizing GPU UtilizationSaving Model StateExporting the ModelCheckpointingDistribution StrategyChoosing a StrategyCreating the StrategyServerless MLCreating a Python PackageSubmitting a Training JobHyperparameter TuningDeploying the ModelSummary
8. Model Quality and Continuous Evaluation
MonitoringTensorBoardWeight HistogramsDevice PlacementData VisualizationTraining EventsModel Quality MetricsMetrics for ClassificationMetrics for RegressionMetrics for Object DetectionQuality EvaluationSliced EvaluationsFairness MonitoringContinuous EvaluationSummary
9. Model Predictions
Making PredictionsExporting the ModelUsing In-Memory ModelsImproving AbstractionImproving EfficiencyOnline PredictionTensorFlow ServingModifying the Serving FunctionHandling Image BytesBatch and Stream PredictionThe Apache Beam PipelineManaged Service for Batch PredictionInvoking Online PredictionEdge MLConstraints and OptimizationsTensorFlow LiteRunning TensorFlow LiteProcessing the Image BufferFederated LearningSummary

10. Trends in Production ML
Machine Learning PipelinesThe Need for PipelinesKubeflow Pipelines ClusterContainerizing the CodebaseWriting a ComponentConnecting ComponentsAutomating a RunExplainabilityTechniquesAdding ExplainabilityNo-Code Computer VisionWhy Use No-Code?Loading DataTrainingEvaluationSummary
11. Advanced Vision Problems
Object MeasurementReference ObjectSegmentationRotation CorrectionRatio and MeasurementsCountingDensity EstimationExtracting PatchesSimulating Input ImagesRegressionPredictionPose EstimationPersonLabThe PoseNet ModelIdentifying Multiple PosesImage SearchDistributed SearchFast SearchBetter EmbeddingsSummary
12. Image and Text Generation
Image UnderstandingEmbeddingsAuxiliary Learning TasksAutoencodersVariational AutoencodersImage GenerationGenerative Adversarial NetworksGAN ImprovementsImage-to-Image TranslationSuper-ResolutionModifying Pictures (Inpainting)Anomaly DetectionDeepfakesImage CaptioningDatasetTokenizing the CaptionsBatchingCaptioning ModelTraining LoopPredictionSummary
Afterword
Index

Overview

This practical book shows you how to employ machine learning models to extract information from images. ML engineers and data scientists will learn how to solve a variety of image problems including classification, object detection, autoencoders, image generation, counting, and captioning with proven ML techniques. This book provides a great introduction to end-to-end deep learning: dataset creation, data preprocessing, model design, model training, evaluation, deployment, and interpretability.

Google engineers Valliappa Lakshmanan, Martin Görner, and Ryan Gillard show you how to develop accurate and explainable computer vision ML models and put them into large-scale production using robust ML architecture in a flexible and maintainable way. You'll learn how to design, train, evaluate, and predict with models written in TensorFlow or Keras.

You'll learn how to:

Design ML architecture for computer vision tasks
Select a model (such as ResNet, SqueezeNet, or EfficientNet) appropriate to your task
Create an end-to-end ML pipeline to train, evaluate, deploy, and explain your model
Preprocess images for data augmentation and to support learnability
Incorporate explainability and responsible AI best practices
Deploy image models as web services or on edge devices
Monitor and manage ML models

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098102357Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills