16 Foundation Models in Computer Vision

In the previous chapter, we learned about how we can build novel applications using NLP and CV techniques. However, this requires a significant amount of training, either from scratch or by fine-tuning a pre-trained model. When leveraging a pre-trained model, the model has generally been trained on a large corpus of data – for example, a dataset like ImageNet, which contains ~21 million images. However, on the internet, we have access to hundreds of millions of images and the alt text corresponding to those images. What if we pre-train models on internet-scale data and use those models for different applications involving object detection, segmentation, and text-to-image generation out of the box without ...

Get Modern Computer Vision with PyTorch - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Modern Computer Vision with PyTorch - Second Edition by V Kishore Ayyadevara, Yeshwanth Reddy

16

Foundation Models in Computer Vision

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly