Deep Learning for Multimodal Data Fusion
Asako Kanezaki⁎; Ryohei Kuga†; Yusuke Sugano†; Yasuyuki Matsushita† ⁎National Institute of Advanced Industrial Science and Technology, Tokyo, Japan†Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
Abstract
Recent advance in deep learning has enabled realistic image-to-image translation of multimodal data. Along with the development, auto-encoders and generative adversarial networks (GAN) have been extended to deal with multimodal input and output. At the same time, multitask learning has been shown to efficiently and effectively address multiple mutually related recognition tasks. Various scene understanding tasks, such as semantic segmentation and depth prediction, ...
Get Multimodal Scene Understanding now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.