book

Practical Machine Learning for Computer Vision

by Valliappa Lakshmanan, Martin Görner, Ryan Gillard

July 2021

Intermediate to advanced

480 pages

12h 44m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who Is This Book For?How to Use This BookOrganization of the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Machine Learning for Computer Vision
Machine LearningDeep Learning Use CasesSummary
2. ML Models for Vision
A Dataset for Machine Perception5-Flowers DatasetReading Image DataVisualizing Image DataReading the Dataset FileA Linear Model Using KerasKeras ModelTraining the ModelA Neural Network Using KerasNeural NetworksDeep Neural NetworksSummaryGlossary
3. Image Vision
Pretrained EmbeddingsPretrained ModelTransfer LearningFine-TuningConvolutional NetworksConvolutional FiltersStacking Convolutional LayersPooling LayersAlexNetThe Quest for DepthFilter Factorization1x1 ConvolutionsVGG19Global Average PoolingModular ArchitecturesInceptionSqueezeNetResNet and Skip ConnectionsDenseNetDepth-Separable ConvolutionsXceptionNeural Architecture Search DesignsNASNetThe MobileNet FamilyBeyond Convolution: The Transformer ArchitectureChoosing a ModelPerformance ComparisonEnsemblingRecommended StrategySummary
4. Object Detection and Image Segmentation
Object DetectionYOLORetinaNetSegmentationMask R-CNN and Instance SegmentationU-Net and Semantic SegmentationSummary
5. Creating Vision Datasets
Collecting ImagesPhotographsImagingProof of ConceptData TypesChannelsGeospatial DataAudio and VideoManual LabelingMultilabelObject DetectionLabeling at ScaleLabeling User InterfaceMultiple TasksVoting and CrowdsourcingLabeling ServicesAutomated LabelingLabels from Related DataNoisy StudentSelf-Supervised LearningBiasSources of BiasSelection BiasMeasurement BiasConfirmation BiasDetecting BiasCreating a DatasetSplitting DataTensorFlow RecordsReading TensorFlow RecordsSummary
6. Preprocessing
Reasons for PreprocessingShape TransformationData Quality TransformationImproving Model QualitySize and ResolutionUsing Keras Preprocessing LayersUsing the TensorFlow Image ModuleMixing Keras and TensorFlowModel TrainingTraining-Serving SkewReusing FunctionsPreprocessing Within the ModelUsing tf.transformData AugmentationSpatial TransformationsColor DistortionInformation DroppingForming Input ImagesSummary
7. Training Pipeline
Efficient IngestionStoring Data EfficientlyReading Data in ParallelMaximizing GPU UtilizationSaving Model StateExporting the ModelCheckpointingDistribution StrategyChoosing a StrategyCreating the StrategyServerless MLCreating a Python PackageSubmitting a Training JobHyperparameter TuningDeploying the ModelSummary
8. Model Quality and Continuous Evaluation
MonitoringTensorBoardWeight HistogramsDevice PlacementData VisualizationTraining EventsModel Quality MetricsMetrics for ClassificationMetrics for RegressionMetrics for Object DetectionQuality EvaluationSliced EvaluationsFairness MonitoringContinuous EvaluationSummary
9. Model Predictions
Making PredictionsExporting the ModelUsing In-Memory ModelsImproving AbstractionImproving EfficiencyOnline PredictionTensorFlow ServingModifying the Serving FunctionHandling Image BytesBatch and Stream PredictionThe Apache Beam PipelineManaged Service for Batch PredictionInvoking Online PredictionEdge MLConstraints and OptimizationsTensorFlow LiteRunning TensorFlow LiteProcessing the Image BufferFederated LearningSummary

10. Trends in Production ML
Machine Learning PipelinesThe Need for PipelinesKubeflow Pipelines ClusterContainerizing the CodebaseWriting a ComponentConnecting ComponentsAutomating a RunExplainabilityTechniquesAdding ExplainabilityNo-Code Computer VisionWhy Use No-Code?Loading DataTrainingEvaluationSummary
11. Advanced Vision Problems
Object MeasurementReference ObjectSegmentationRotation CorrectionRatio and MeasurementsCountingDensity EstimationExtracting PatchesSimulating Input ImagesRegressionPredictionPose EstimationPersonLabThe PoseNet ModelIdentifying Multiple PosesImage SearchDistributed SearchFast SearchBetter EmbeddingsSummary
12. Image and Text Generation
Image UnderstandingEmbeddingsAuxiliary Learning TasksAutoencodersVariational AutoencodersImage GenerationGenerative Adversarial NetworksGAN ImprovementsImage-to-Image TranslationSuper-ResolutionModifying Pictures (Inpainting)Anomaly DetectionDeepfakesImage CaptioningDatasetTokenizing the CaptionsBatchingCaptioning ModelTraining LoopPredictionSummary
Afterword
Index

Content preview from Practical Machine Learning for Computer Vision

Afterword

In 1966, MIT professor Seymour Papert launched a summer project for his students. The final goal of the project was to name objects in images by matching them with a vocabulary of known objects. He helpfully broke the task down for them into subprojects, and expected the group to be done in a couple of months. It’s safe to say that Dr. Papert underestimated the complexity of the problem a little.

We started this book by looking at naive machine learning approaches like fully connected neural networks that do not take advantage of the special characteristics of images. In Chapter 2, trying the naive approaches allowed us to learn how to read in images, and how to train, evaluate, and predict with machine learning models.

Then, in Chapter 3, we introduced many of the innovative concepts—convolutional filters, max-pooling layers, skip connections, modules, squeeze activation, and so on—that enable modern-day machine learning models to work well at extracting information from images. Implementing these models, practically speaking, involves using either a built-in Keras model or a TensorFlow Hub layer. We also covered transfer learning and fine-tuning in detail.

In Chapter 4, we looked at how to use the computer vision models covered in Chapter 3 to solve two more fundamental problems in computer vision: object detection and image segmentation.

The next few chapters of the book covered, in depth, each of the stages involved in creating production computer vision machine ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098102357Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Practical Machine Learning for Computer Vision

by Valliappa Lakshmanan, Martin Görner, Ryan Gillard

Afterword

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.