book

Hands-On Computer Vision with TensorFlow 2

by Benjamin Planche, Eliot Andres

May 2019

Intermediate to advanced

372 pages

8h 44m

English

Packt Publishing

Read now

Unlock full access

Hands-On Computer Vision with TensorFlow 2
Why subscribe?
About the authorsAbout the reviewersPackt is searching for authors like you
Who this book is forWhat this book coversTo get the most out of this bookDownload and run the example code filesDownload the code filesStudy and run the experimentsStudy the Jupyter notebooks onlineRun the Jupyter notebooks on your machineRun the Jupyter notebooks in Google ColabDownload the color imagesConventions usedGet in touchReviews
Technical requirementsComputer vision in the wildIntroducing computer visionMain tasks and their applicationsContent recognitionObject classificationObject identificationObject detection and localizationObject and instance segmentationPose estimationVideo analysisInstance trackingAction recognitionMotion estimationContent-aware image editionScene reconstructionA brief history of computer visionFirst steps to initial successesUnderestimating the perception taskHand-crafting local featuresAdding some machine learning on topRise of deep learningEarly attempts and failuresRise and fall of the perceptronToo heavy to scaleReasons for the comebackThe internet – the new El Dorado of data scienceMore power than everDeep learning or the rebranding of artificial neural networksWhat makes learning deep?Deep learning eraGetting started with neural networksBuilding a neural networkImitating neuronsBiological inspirationMathematical modelImplementationLayering neurons togetherMathematical modelImplementationApplying our network to classificationSetting up the taskImplementing the networkTraining a neural networkLearning strategiesSupervised learningUnsupervised learningReinforcement learningTeaching timeEvaluating the lossBackpropagating the lossTeaching our network to classifyTraining considerations – underfitting and overfittingSummaryQuestionsFurther reading
Technical requirementsGetting started with TensorFlow 2 and KerasIntroducing TensorFlowTensorFlow's main architectureIntroducing KerasA simple computer vision model using KerasPreparing the dataBuilding the modelTraining the modelModel performanceTensorFlow 2 and Keras in detailCore conceptsIntroducing tensorsTensorFlow graphsComparing lazy execution to eager executionCreating graphs in TensorFlow 2Introducing TensorFlow AutoGraph and tf.functionBackpropagating errors using the gradient tapeKeras models and layersSequential and functional APIsCallbacksAdvanced conceptsHow tf.function worksVariables in TensorFlow 2Distribution strategiesUsing the Estimator APIAvailable pre-made EstimatorsTraining a custom EstimatorThe TensorFlow ecosystemTensorBoardTensorFlow Addons and TensorFlow ExtendedTensorFlow Lite and TensorFlow.jsWhere to run your modelOn a local machineOn a remote machineOn Google CloudSummaryQuestions
Technical requirementsDiscovering convolutional neural networksNeural networks for multidimensional dataProblems with fully connected networksAn explosive number of parametersA lack of spatial reasoningIntroducing CNNsCNN operationsConvolutional layersConceptPropertiesHyperparametersTensorFlow/Keras methodsPooling layersConcept and hyperparametersTensorFlow/Keras methodsFully connected layersUsage in CNNsTensorFlow/Keras methodsEffective receptive fieldDefinitionsFormulaCNNs with TensorFlowImplementing our first CNNLeNet-5 architectureTensorFlow and Keras implementationsApplication to MNISTRefining the training processModern network optimizersGradient descent challengesTraining velocity and trade-offSuboptimal local minimaA single hyperparameter for heterogeneous parametersAdvanced optimizersMomentum algorithmsThe Ada familyRegularization methodsEarly stoppingL1 and L2 regularizationPrinciplesTensorFlow and Keras implementationsDropoutDefinitionTensorFlow and Keras methodsBatch normalizationDefinitionTensorFlow and Keras methodsSummaryQuestionsFurther reading

Technical requirementsUnderstanding advanced CNN architecturesVGG – a standard CNN architectureOverview of the VGG architectureMotivationArchitectureContributions – standardizing CNN architecturesReplacing large convolutions with multiple smaller onesIncreasing the depth of the feature mapsAugmenting data with scale jitteringReplacing fully connected layers with convolutionsImplementations in TensorFlow and KerasThe TensorFlow modelThe Keras modelGoogLeNet and the inception moduleOverview of the GoogLeNet architectureMotivationArchitectureContributions – popularizing larger blocks and bottlenecksCapturing various details with inception modulesUsing 1 x 1 convolutions as bottlenecksPooling instead of fully connectingFighting vanishing gradient with intermediary lossesImplementations in TensorFlow and KerasInception module with the Keras Functional APITensorFlow model and TensorFlow HubThe Keras modelResNet – the residual networkOverview of the ResNet architectureMotivationArchitectureContributions – forwarding the information more deeplyEstimating a residual function instead of a mappingGoing ultra-deepImplementations in TensorFlow and KerasResidual blocks with the Keras Functional APIThe TensorFlow model and TensorFlow HubThe Keras modelLeveraging transfer learningOverviewDefinitionHuman inspirationMotivationTransferring CNN knowledgeUse casesSimilar tasks with limited training dataSimilar tasks with abundant training dataDissimilar tasks with abundant training dataDissimilar tasks with limited training dataTransfer learning with TensorFlow and KerasModel surgeryRemoving layersGrafting layersSelective trainingRestoring pretrained parametersFreezing layersSummaryQuestionsFurther reading
Technical requirementsIntroducing object detectionBackgroundApplicationsBrief historyEvaluating the performance of a modelPrecision and recallPrecision-recall curveAverage precision and mean average precisionAverage precision thresholdA fast object detection algorithm – YOLOIntroducing YOLOStrengths and limitations of YOLOYOLO's main conceptsInferring with YOLOThe YOLO backboneYOLO's layers outputIntroducing anchor boxesHow YOLO refines anchor boxesPost-processing the boxesNMSYOLO inference summarizedTraining YOLOHow the YOLO backbone is trainedYOLO lossBounding box lossObject confidence lossClassification lossFull YOLO lossTraining techniquesFaster R-CNN – a powerful object detection modelFaster R-CNN's general architectureStage 1 – Region proposalsStage 2 – ClassificationFaster R-CNN architectureRoI poolingTraining Faster R-CNNTraining the RPNThe RPN lossFast R-CNN lossTraining regimenTensorFlow Object Detection APIUsing a pretrained modelTraining on a custom datasetSummaryQuestionsFurther reading
Technical requirementsTransforming images with encoders-decodersIntroduction to encoders-decodersEncoding and decodingAuto-encodingPurposeBasic example – image denoisingSimplistic fully connected AEApplication to image denoisingConvolutional encoders-decodersUnpooling, transposing, and dilatingTransposed convolution (deconvolution)UnpoolingUpsampling and resizingDilated/atrous convolutionExample architectures – FCN and U-NetFully convolutional networksU-NetIntermediary example – image super-resolutionFCN implementationApplication to upscaling imagesUnderstanding semantic segmentationObject segmentation with encoders-decodersOverviewDecoding as label mapsTraining with segmentation losses and metricsPost-processing with conditional random fieldsAdvanced example – image segmentation for self-driving carsTask presentationExemplary solutionThe more difficult case of instance segmentationFrom object segmentation to instance segmentationRespecting boundariesPost-processing into instance masksFrom object detection to instance segmentation – Mask R-CNNApplying semantic segmentation to bounding boxesBuilding an instance segmentation model with Faster-RCNNSummaryQuestionsFurther reading
Technical requirementsEfficient data servingIntroducing the TensorFlow Data APIIntuition behind the TensorFlow Data APIFeeding fast and data-hungry modelsInspiration from lazy structuresStructure of TensorFlow data pipelinesExtract, Transform, LoadAPI interfaceSetting up input pipelinesExtracting (from tensors, text files, TFRecord files, and more)From NumPy and TensorFlow dataFrom filesFrom other inputs (generator, SQL database, range, and others)Transforming the samples (parsing, augmenting, and more)Parsing images and labelsParsing TFRecord filesEditing samplesTransforming the datasets (shuffling, zipping, parallelizing, and more)Structuring datasetsMerging datasetsLoadingOptimizing and monitoring input pipelinesFollowing best practices for optimizationParallelizing and prefetchingFusing operationsPassing options to ensure global propertiesMonitoring and reusing datasetsAggregating performance statisticsCaching and reusing datasetsHow to deal with data scarcityAugmenting datasetsOverviewWhy augment datasets?ConsiderationsAugmenting images with TensorFlowTensorFlow Image moduleExample – augmenting images for our autonomous driving applicationRendering synthetic datasetsOverviewRise of 3D databasesBenefits of synthetic dataGenerating synthetic images from 3D modelsRendering from 3D modelsPost-processing synthetic imagesProblem – realism gapLeveraging domain adaptation and generative models (VAEs and GANs)Training models to be robust to domain changesSupervised domain adaptationUnsupervised domain adaptationDomain randomizationGenerating larger or more realistic datasets with VAEs and GANsDiscriminative versus generative modelsVAEsGANsAugmenting datasets with conditional GANsSummaryQuestionsFurther reading
Technical requirementsIntroducing RNNsBasic formalismGeneral understanding of RNNsLearning RNN weightsBackpropagation through timeTruncated backpropagationLong short-term memory cellsLSTM general principlesLSTM inner workingsClassifying videosApplying computer vision to videoClassifying videos with an LSTMExtracting features from videosTraining the LSTMDefining the modelLoading the dataTraining the modelSummaryQuestionsFurther reading
Technical requirementsOptimizing computational and disk footprintsMeasuring inference speedMeasuring latencyUsing tracing tools to understand computational performanceImproving model inference speedOptimizing for hardware Optimizing on CPUsOptimizing on GPUsOptimizing on specialized hardwareOptimizing inputOptimizing post-processingWhen the model is still too slowInterpolating and trackingModel distillationReducing model sizeQuantizationChannel pruning and weight sparsificationOn-device machine learningConsiderations of on-device machine learningBenefits of on-device MLLatencyPrivacyCostLimitations of on-device MLPractical on-device computer visionOn-device computer vision particularitiesGenerating a SavedModelGenerating a frozen graphImportance of preprocessingExample app – recognizing facial expressionsIntroducing MobileNetDeploying models on-deviceRunning on iOS devices using Core MLConverting from TensorFlow or KerasLoading the modelUsing the modelRunning on Android using TensorFlow LiteConverting the model from TensorFlow or KerasLoading the modelUsing the modelRunning in the browser using TensorFlow.jsConverting the model to the TensorFlow.js formatUsing the modelRunning on other devicesSummaryQuestions
Automatic migrationMigrating TensorFlow 1 codeSessionsPlaceholdersVariable managementLayers and modelsOther conceptsReferencesChapter 1: Computer Vision and Neural NetworksChapter 2: TensorFlow Basics and Training a ModelChapter 3: Modern Neural NetworksChapter 4: Influential Classification ToolsChapter 5: Object Detection ModelsChapter 6: Enhancing and Segmenting ImagesChapter 7: Training on Complex and Scarce DatasetsChapter 8: Video and Recurrent Neural NetworksChapter 9: Optimizing Models and Deploying on Mobile Devices
AnswersChapter 1Chapter 2Chapter 3Chapter 4Chapter 5Chapter 6Chapter 7Chapter 8Chapter 9
Leave a review - let other readers know what you think

Content preview from Hands-On Computer Vision with TensorFlow 2

Training on Complex and Scarce Datasets

moData is the lifeblood of deep learning applications. As such, training data should be able to flow unobstructed into networks, and it should contain all the meaningful information that is essential to prepare the methods for their tasks. Oftentimes, however, datasets can have complex structures or be stored on heterogeneous devices, complicating the process of efficiently feeding their content to the models. In other cases, relevant training images or annotations can be unavailable, depriving models of the information they need to learn.

Thankfully, for the former cases, TensorFlow provides a rich framework to set up optimized data pipelines—tf.data. For the latter cases, researchers have been proposing ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Mastering Computer Vision with TensorFlow 2.x

Publisher Resources

ISBN: 9781788830645Supplemental Content

Hands-On Computer Vision with TensorFlow 2

by Benjamin Planche, Eliot Andres

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Mastering Computer Vision with TensorFlow 2.x

Hands-On Deep Learning for Images with TensorFlow

TensorFlow 2.0 Computer Vision Cookbook

Hands-On Neural Networks with TensorFlow 2.0

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Mastering Computer Vision with TensorFlow 2.x

Hands-On Deep Learning for Images with TensorFlow

TensorFlow 2.0 Computer Vision Cookbook

Hands-On Neural Networks with TensorFlow 2.0

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.