10.2 Unsupervised Pattern Discovery in Structured Data

Computational analysis of structural and relational data is a very broad and active field of research. Here, we briefly review central approaches that are related to relevant substructure detection, including graph mining, optimal subgraph search, graph clustering, biclustering, itemset mining, and higher-order relational data mining.

10.2.1 Graph Mining

Graph mining refers to the search for subgraph patterns with predefined characteristics, in a database of one or multiple graphs. In many cases, it is possible to design algorithms that yield the complete set of solutions; such approaches are called enumerative. In the following presentation, we focus on the most common tasks. We start with the classical problem of frequent subgraph mining [5,6]. Given a database of labeled graphs G1, . . ., Gl, the task is to find all connected subgraphs that occur in at least m graphs, where m is a positive integer referred to as the minimum support threshold. As the same node label may appear multiple times in each graph of the database (e.g., atom names in a database of molecule graphs [7]), these methods generally have to deal with the problem of subgraph isomorphism. In biological networks considering gene or protein relationships, the node labels within a graph are typically unique. However, as the graphs are large, the frequency criterion is typically combined with other criteria such as interaction density or cut thresholds, in order ...

Get Statistical and Machine Learning Approaches for Network Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.