Chapter 13. Graphs and Foundation Models for Unsupervised Segmentation
Extracting meaningful information from 3D datasets is challenging. We have massive data with intricate details, but we lack the integrated intelligence needed for high-level tasks. This gap limits the potential for advanced 3D scene understanding: without semantics and topology, we cannot extract individual objects and their relationships, such as chairs and tables, and their arrangement within a room. Let’s leverage 3D machine learning to extract these.
Supervised learning, which thrives on labeled data, is our first investigation. However, a major hurdle is the scarcity of labeled datasets for 3D data: without a lot of data, building such a system is limited. The good news is that technological leaps are astonishing, especially when we leverage cutting-edge research in unsupervised segmentation. But to bring human-level reasoning to computers, extracting formalized meanings from the 3D entities we observe is crucial.
This is why we combine 3D point clouds, graph theory, and deep learning in this chapter to unlock new scene-understanding capabilities for interpreting our visual world. Among these advancements, I want to focus on two major solutions:
-
Connectivity-based clustering (see Chapter 12), leveraging the power of graph theory
-
Image-based 3D segmentation using a foundation model: the Segment Anything Model
Before proceeding, I want to emphasize a key point: there is a critical distinction between ...