Chapter 9

Self-Supervised Learning from Web Data for Multimodal Retrieval

Raul Gomez^⁎^,^†; Lluis Gomez^†; Jaume Gibert^⁎; Dimosthenis Karatzas^† ^⁎Eurecat, Centre Tecnològic de Catalunya, Unitat de Tecnologies Audiovisuals, Barcelona, Spain^†Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain

Abstract

Self-supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human-annotated data. Web and social media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learned in the text domain and transfer ...

Get Multimodal Scene Understanding now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Multimodal Scene Understanding by Michael Ying Yang, Bodo Rosenhahn, Vittorio Murino

Self-Supervised Learning from Web Data for Multimodal Retrieval

Abstract

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly