What is multimodality learning?
Before we dive deeper, the first question to ask is, what is multimodality/multimodal?
Modality refers to a certain type of information and/or the representation format in which information is stored. For example, humans have various sensory modalities, such as light, sound, and pressure. In our case, we are talking more about how the data is acquired and stored. For example, commonly available modalities include natural language (both spoken or written), visual information (from images or videos), audio (including voice, sounds, and music), Light Detection and Ranging (LIDAR) data, depth images, infrared images, functional MRI and physiological signals, electrocardiogram (ECG), and so on.
The path of leveraging ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access