Graph kernels allow the utilization of graph theory in a formal framework suitable for kernel-based machine learning. They have the advantage that they can potentially exploit information not available to vector-based representations alone. In many application domains, graphs are a natural representation of input data, for example, the molecular structure graph in cheminformatics. On the downside, most graph kernels are computationally demanding, with cubic or higher degree runtimes. This prevents them from being used with large graphs. They are not always interpretable, although this depends highly on the kernel.

What makes a good graph kernel? Borgwardt and Kriegel [33] have argued that a graph kernel “should be (i) a good measure of graph similarity, (ii) computable in polynomial time, (iii) positive definite, and (iv) applicable to all graphs, not just a small subset of graphs.” While (i), (ii), (iii) are reasonable requirements, with (i) being somewhat vague, I disagree with (iv). From basic considerations (Section 8.1.3), a graph kernel is a trade-off between completeness (expressivity) and complexity (runtime). A kernel that exploits characteristics of specific graph classes (e.g., graphs with bounded vertex degrees, such as molecular structure graphs) can take advantage of these characteristics to achieve a better trade-off on this graph class. Restriction to a specific graph class is a way to introduce domain knowledge; it is also related to ...

Start Free Trial

No credit card required