book

Practical Machine Learning for Computer Vision

by Valliappa Lakshmanan, Martin Görner, Ryan Gillard

July 2021

Intermediate to advanced

480 pages

12h 44m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who Is This Book For?How to Use This BookOrganization of the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Machine Learning for Computer Vision
Machine LearningDeep Learning Use CasesSummary
2. ML Models for Vision
A Dataset for Machine Perception5-Flowers DatasetReading Image DataVisualizing Image DataReading the Dataset FileA Linear Model Using KerasKeras ModelTraining the ModelA Neural Network Using KerasNeural NetworksDeep Neural NetworksSummaryGlossary
3. Image Vision
Pretrained EmbeddingsPretrained ModelTransfer LearningFine-TuningConvolutional NetworksConvolutional FiltersStacking Convolutional LayersPooling LayersAlexNetThe Quest for DepthFilter Factorization1x1 ConvolutionsVGG19Global Average PoolingModular ArchitecturesInceptionSqueezeNetResNet and Skip ConnectionsDenseNetDepth-Separable ConvolutionsXceptionNeural Architecture Search DesignsNASNetThe MobileNet FamilyBeyond Convolution: The Transformer ArchitectureChoosing a ModelPerformance ComparisonEnsemblingRecommended StrategySummary
4. Object Detection and Image Segmentation
Object DetectionYOLORetinaNetSegmentationMask R-CNN and Instance SegmentationU-Net and Semantic SegmentationSummary
5. Creating Vision Datasets
Collecting ImagesPhotographsImagingProof of ConceptData TypesChannelsGeospatial DataAudio and VideoManual LabelingMultilabelObject DetectionLabeling at ScaleLabeling User InterfaceMultiple TasksVoting and CrowdsourcingLabeling ServicesAutomated LabelingLabels from Related DataNoisy StudentSelf-Supervised LearningBiasSources of BiasSelection BiasMeasurement BiasConfirmation BiasDetecting BiasCreating a DatasetSplitting DataTensorFlow RecordsReading TensorFlow RecordsSummary
6. Preprocessing
Reasons for PreprocessingShape TransformationData Quality TransformationImproving Model QualitySize and ResolutionUsing Keras Preprocessing LayersUsing the TensorFlow Image ModuleMixing Keras and TensorFlowModel TrainingTraining-Serving SkewReusing FunctionsPreprocessing Within the ModelUsing tf.transformData AugmentationSpatial TransformationsColor DistortionInformation DroppingForming Input ImagesSummary
7. Training Pipeline
Efficient IngestionStoring Data EfficientlyReading Data in ParallelMaximizing GPU UtilizationSaving Model StateExporting the ModelCheckpointingDistribution StrategyChoosing a StrategyCreating the StrategyServerless MLCreating a Python PackageSubmitting a Training JobHyperparameter TuningDeploying the ModelSummary
8. Model Quality and Continuous Evaluation
MonitoringTensorBoardWeight HistogramsDevice PlacementData VisualizationTraining EventsModel Quality MetricsMetrics for ClassificationMetrics for RegressionMetrics for Object DetectionQuality EvaluationSliced EvaluationsFairness MonitoringContinuous EvaluationSummary
9. Model Predictions
Making PredictionsExporting the ModelUsing In-Memory ModelsImproving AbstractionImproving EfficiencyOnline PredictionTensorFlow ServingModifying the Serving FunctionHandling Image BytesBatch and Stream PredictionThe Apache Beam PipelineManaged Service for Batch PredictionInvoking Online PredictionEdge MLConstraints and OptimizationsTensorFlow LiteRunning TensorFlow LiteProcessing the Image BufferFederated LearningSummary

10. Trends in Production ML
Machine Learning PipelinesThe Need for PipelinesKubeflow Pipelines ClusterContainerizing the CodebaseWriting a ComponentConnecting ComponentsAutomating a RunExplainabilityTechniquesAdding ExplainabilityNo-Code Computer VisionWhy Use No-Code?Loading DataTrainingEvaluationSummary
11. Advanced Vision Problems
Object MeasurementReference ObjectSegmentationRotation CorrectionRatio and MeasurementsCountingDensity EstimationExtracting PatchesSimulating Input ImagesRegressionPredictionPose EstimationPersonLabThe PoseNet ModelIdentifying Multiple PosesImage SearchDistributed SearchFast SearchBetter EmbeddingsSummary
12. Image and Text Generation
Image UnderstandingEmbeddingsAuxiliary Learning TasksAutoencodersVariational AutoencodersImage GenerationGenerative Adversarial NetworksGAN ImprovementsImage-to-Image TranslationSuper-ResolutionModifying Pictures (Inpainting)Anomaly DetectionDeepfakesImage CaptioningDatasetTokenizing the CaptionsBatchingCaptioning ModelTraining LoopPredictionSummary
Afterword
Index

Content preview from Practical Machine Learning for Computer Vision

Chapter 11. Advanced Vision Problems

So far in this book, we have looked primarily at the problem of classifying an entire image. In Chapter 2 we touched on image regression, and in Chapter 4 we discussed object detection and image segmentation. In this chapter, we will look at more advanced problems that can be solved using computer vision: measurement, counting, pose estimation, and image search.

Tip

The code for this chapter is in the 11_adv_problems folder of the book’s GitHub repository. We will provide file names for code samples and notebooks where applicable.

Object Measurement

Sometimes we want to know the measurements of an object within an image (e.g., that a sofa is 180 cm long). While we can simply use pixel-wise regression to measure something like ground precipitation using aerial images of cloud cover, we will need to do something more sophisticated for the object measurement scenario. We can’t simply count the number of pixels and infer a size from that, because the same object could be represented by a different number of pixels due to where it is within the image, its rotation, aspect ratio, etc. Let’s walk through the four steps needed to measure an object from a photograph of it, following an approach suggested by Imaginea Labs.

Reference Object

Suppose we’re an online shoe store, and we want to help customers find the best shoe size by using photographs of their footprints. We ask customers to get their feet wet and step onto a paper material, then upload ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098102357Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business