book

Multimodal Scene Understanding

Name: Multimodal Scene Understanding
ISBN: 9780128173596

by Michael Ying Yang, Bodo Rosenhahn, Vittorio Murino

July 2019

Intermediate to advanced

422 pages

13h 57m

English

Academic Press

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
List of Contributors
Chapter 1: Introduction to Multimodal Scene Understanding
Abstract1.1. Introduction1.2. Organization of the BookReferences
Chapter 2: Deep Learning for Multimodal Data Fusion
Abstract2.1. Introduction2.2. Related Work2.3. Basics of Multimodal Deep Learning: VAEs and GANs2.4. Multimodal Image-to-Image Translation Networks2.5. Multimodal Encoder–Decoder Networks2.6. Experiments2.7. ConclusionReferences
Chapter 3: Multimodal Semantic Segmentation: Fusion of RGB and Depth Data in Convolutional Neural Networks
Abstract3.1. Introduction3.2. Overview3.3. Methods3.4. Results and Discussion3.5. ConclusionReferences
Chapter 4: Learning Convolutional Neural Networks for Object Detection with Very Little Training Data
AbstractAcknowledgement4.1. Introduction4.2. Fundamentals4.3. Related Work4.4. Traffic Sign Detection4.5. Localization4.6. Clustering4.7. Dataset4.8. Experiments4.9. ConclusionReferences
Chapter 5: Multimodal Fusion Architectures for Pedestrian Detection
AbstractAcknowledgement5.1. Introduction5.2. Related Work5.3. Proposed Method5.4. Experimental Results and Discussion5.5. ConclusionReferences

Chapter 6: Multispectral Person Re-Identification Using GAN for Color-to-Thermal Image Translation
AbstractAcknowledgements6.1. Introduction6.2. Related Work6.3. ThermalWorld Dataset6.4. Method6.5. Evaluation6.6. ConclusionReferences
Chapter 7: A Review and Quantitative Evaluation of Direct Visual–Inertial Odometry
Abstract7.1. Introduction7.2. Related Work7.3. Background: Nonlinear Optimization and Lie Groups7.4. Background: Direct Sparse Odometry7.5. Direct Sparse Visual–Inertial Odometry7.6. Calculating the Relative Jacobians7.7. Results7.8. ConclusionReferences
Chapter 8: Multimodal Localization for Embedded Systems: A Survey
Abstract8.1. Introduction8.2. Positioning Systems and Perception Sensors8.3. State of the Art on Localization Methods8.4. Multimodal Localization for Embedded Systems8.5. Application Domains8.6. ConclusionReferences
Chapter 9: Self-Supervised Learning from Web Data for Multimodal Retrieval
AbstractAcknowledgements9.1. Introduction9.2. Related Work9.3. Multimodal Text–Image Embedding9.4. Text Embeddings9.5. Benchmarks9.6. Retrieval on InstaCities1M and WebVision Datasets9.7. Retrieval in the MIRFlickr Dataset9.8. Comparing the Image and Text Embeddings9.9. Visualizing CNN Activation Maps9.10. Visualizing the Learned Semantic Space with t-SNE9.11. ConclusionsReferences
Chapter 10: 3D Urban Scene Reconstruction and Interpretation from Multisensor Imagery
Abstract10.1. Introduction10.2. Pose Estimation for Wide-Baseline Image Sets10.3. Dense 3D Reconstruction10.4. Scene Classification10.5. Scene and Building Decomposition10.6. Building Modeling10.7. Conclusion and Future WorkReferences
Chapter 11: Decision Fusion of Remote-Sensing Data for Land Cover Classification
Abstract11.1. Introduction11.2. Proposed Framework11.3. Use Case #1: Hyperspectral and Very High Resolution Multispectral Imagery for Urban Material Discrimination11.4. Use Case #2: Urban Footprint Detection11.5. Final Outlook and PerspectivesReferences
Chapter 12: Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision
Abstract12.1. Introduction12.2. Related Work12.3. Generalized Distillation with Multiple Stream Networks12.4. Experiments12.5. Conclusions and Future WorkReferences
Index

Overview

Multimodal Scene Understanding: Algorithms, Applications and Deep Learning presents recent advances in multi-modal computing, with a focus on computer vision and photogrammetry. It provides the latest algorithms and applications that involve combining multiple sources of information and describes the role and approaches of multi-sensory data and multi-modal deep learning. The book is ideal for researchers from the fields of computer vision, remote sensing, robotics, and photogrammetry, thus helping foster interdisciplinary interaction and collaboration between these realms.

Researchers collecting and analyzing multi-sensory data collections – for example, KITTI benchmark (stereo+laser) - from different platforms, such as autonomous vehicles, surveillance cameras, UAVs, planes and satellites will find this book to be very useful.

Contains state-of-the-art developments on multi-modal computing
Shines a focus on algorithms and applications
Presents novel deep learning topics on multi-sensor fusion and multi-modal deep learning

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780128173596

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills