Skip to Content
Practical Machine Learning for Computer Vision
book

Practical Machine Learning for Computer Vision

by Valliappa Lakshmanan, Martin Görner, Ryan Gillard
July 2021
Intermediate to advanced
480 pages
12h 44m
English
O'Reilly Media, Inc.
Content preview from Practical Machine Learning for Computer Vision

Chapter 12. Image and Text Generation

So far in this book, we have focused on computer vision methods that act on images. In this chapter, we will look at vision methods that can generate images. Before we get to image generation, though, we have to learn how to train a model to understand what’s in an image so that it knows what to generate. We will also look at the problem of generating text (captions) based on the content of an image.

Tip

The code for this chapter is in the 12_generation folder of the book’s GitHub repository. We will provide file names for code samples and notebooks where applicable.

Image Understanding

It’s one thing to know what components are in an image, but it’s quite another to actually understand what is happening in the image and to use that information for other tasks. In this section, we will quickly recap embeddings and then look at various methods (autoencoders and variational autoencoders) to encode an image and learn about its properties.

Embeddings

A common problem with deep learning use cases is lack of sufficient data, or data of high enough quality. In Chapter 3 we discussed transfer learning, which provides a way to extract embeddings that were learned from a model trained on a larger dataset, and apply that knowledge to train an effective model on a smaller dataset.

With transfer learning, the embeddings we use were created by training the model on the same task, such as image classification. For instance, suppose we have a ResNet50 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Deep Learning for Computer Vision

Deep Learning for Computer Vision

Rajalingappaa Shanmugamani
PyTorch for Deep Learning and Computer Vision

PyTorch for Deep Learning and Computer Vision

Rayan Slim, Jad Slim, Amer Abdulkader, Sarmad Tanveer
Machine Learning for High-Risk Applications

Machine Learning for High-Risk Applications

Patrick Hall, James Curtis, Parul Pandey

Publisher Resources

ISBN: 9781098102357Errata Page