Chapter 1. Introduction
Graph data has become ubiquitous in the last decade. Graphs underpin everything from consumer-facing systems like navigation and social networks, to critical infrastructure like supply chains and policing. A consistent theme has emerged that applying knowledge in context is the single most powerful tool that most businesses have. Through research and experience, a set of patterns and practices called knowledge graphs has been developed to support extracting knowledge from data.
This report is for information technology professionals who are interested in managing and exploiting data for value. For the CIO or CDO, the report is brief yet thorough enough to provide an overview of the techniques and how they are delivered. For the data professional, data scientist, or software professional, this report provides an easy on-ramp to the world of knowledge graphs, and a language for discussing their implementation with peers and management.
Our fundamental tenet is that knowledge graphs are useful because they provide contextualized understanding of data. They achieve this by adding a layer of metadata that imposes rules for structure and interpretation. We’ll illustrate how using knowledge graphs can help extract greater value from existing data, drive automation and process optimization, improve predictions, and enable an agile response to changing business environments.
This chapter explains the background and motivation behind knowledge graphs. To do so, we’ll discuss graphs and graph data and show how we can build systems with smarter data using knowledge graph techniques.
What Are Graphs?
Knowledge graphs are a type of graph, so it’s important to have a basic understanding of graphs before we go much further. Graphs are simple structures where we use nodes (or vertices) connected by relationships (or edges) to create high-fidelity models of a domain. To avoid any confusion, the graphs we talk about in this book have nothing to do with visualizing data as histograms or plotting a function, which we consider to be charts, as shown in Figure 1-1.
The graphs we talk about in this book are sometimes referred to as networks. They are a simple but powerful way of describing how things connect.
Graphs are not new. In fact, graph theory was invented by the Swiss mathematician Leonhard Euler in the 18th century. It was created to help compute the minimum distance that the emperor of Prussia had to walk to see the town of Königsberg (modern-day Kaliningrad) by ensuring that each of its seven bridges was crossed only once, as shown in Figure 1-2.
Euler’s insight was that the problem shown in Figure 1-2 could be reduced to a logical form, stripping out all the noise of the real world and concentrating solely on how things are connected. He was able to demonstrate that the problem didn’t need to involve bridges, islands, or emperors. He proved that in fact the physical geography of Königsberg was completely irrelevant.
Using the superimposed graph in Figure 1-2, you can try to figure out the shortest route for walking around Königsberg without having to put on your walking boots and try it for real. In fact, Euler proved that the emperor could not walk the whole town crossing each bridge only once, since there would have needed to be (at least) one island (node) with an even number of connecting bridges (relationships) from which the emperor could start his walk. No such island existed in Königsberg.
Building on Euler’s work, mathematicians have studied various graph models, all variations on the theme of nodes connected by relationships. Some models allow relationships to be directed, where they have an explicit start and end node, while some have undirected relationships connecting nodes. Some models, like hypergraphs, allow relationships to connect more than one node.
Some graph models like the property graph model allow both nodes and relationships to contain properties. A property consists of a name (also called a key) and a value. Properties on a node can be used, for example, to give a name to a node representing a person or coordinates to a node representing a vehicle. Properties on relationships can be used to store distances between road junctions or the number of times an algorithm has processed a relationship.
Each of the graph models has its own quirks and benefits. In contemporary IT systems, enterprises have mostly settled on the property graph model. It’s a model that is well suited to common data management problems and straightforward for software and data professionals to work with. To illustrate the property graph model, we’ve created a tiny social graph in Figure 1-3, but compared to the example in Figure 1-2, this graph is far richer.
In Figure 1-3 each node has a label that represents its role in the graph.
Some nodes are labeled Person
and some labeled Place
, representing people and places respectively.
Stored inside those nodes are properties.
For example, one node has name:'Rosa'
and gender:'f'
that we can interpret as being a female person called Rosa.
Note that the Karl
and Fred
nodes have slightly different properties on them, which is perfectly fine too.
If we need to ensure that all Person
nodes have the same property keys, we can apply constraints to the label to ensure those properties exist, are unique, and so on.
Between the nodes in Figure 1-3 we have relationships.
The relationships are richer than in Figure 1-2, since they have a type, a direction, and can have optional properties on them.
The Person
node with the name:'Rosa'
property has an outgoing LIVES_IN
relationship with property since: 2020
to the Place
node with city:'Berlin'
property.
We read this in slightly poor English as “Rosa lives in Berlin since 2020” and definitely not that Berlin lives in Rosa!
We also see that Fred
is a FRIEND
of Karl
and that Karl
is a FRIEND
of FRED
.
Rosa
and Karl
are also friends, but Rosa
and Fred
are not.
Relationships in property graphs are not symmetric.1
In most domains, relationships apply in one direction such that people own cars and cars do not own people.
In the property graph model, there are no limits on the number of nodes or the relationships that connect them. Some nodes are densely and intricately connected while others are sparsely connected, to match the problem domain. Similarly, some nodes have lots of properties, while some have few or none at all. Some relationships have lots of properties, but many tend to have none.
It’s easy to see how the graph in Figure 1-3 can answer questions about friendships and who lives where. Extending the model to include other important data items like interests, publications, or jobs is also straightforward. Just keep adding nodes and relationships to match your problem domain. Creating large, complex graphs with many millions or billions of connections is not a problem for modern graph databases and graph-processing software, so building even very large knowledge graphs is possible.
Graph data models are uniquely able to represent complex, indirect relationships in a way that is both human readable, and machine friendly. Data structures like graphs might seem computerish and off-putting, but in reality they are created from very simple primitives and patterns. The combination of a humane data model and ease of algorithmic processing to discover otherwise hidden patterns and characteristics is what has made graphs so popular. It’s a combination we will exploit in our knowledge graphs.
Now that we’re comfortable with graphs, we move forward to interpreting connected data as knowledge.
The Motivation for Knowledge Graphs
There has been a recent explosion of interest in knowledge graphs, with a myriad of research papers, solutions, analyst reports, groups, and conferences. Knowledge graphs have become so popular partly because graph technology has accelerated in recent years but also because there is strong demand to make sense of data.
External factors have undoubtedly accelerated knowledge graphs to greater prominence. Stresses from the COVID-19 pandemic have strained some organizations to the point of breaking. Decision making has needed to be rapid, but businesses have been hampered by the lack of timely and accurate insight.
Businesses are reconfiguring their operations and processes to be ready to flex rapidly. As historical knowledge ages faster and is invalidated by market dynamics, many organizations need new ways of capturing, analyzing, and learning from data. We need to fuel rapid insights and recommendations across the business, from customer experience and patient outcomes to product innovation, fraud detection, and automation: we need contextualized data to generate knowledge.
Knowledge Graphs: A Definition
We now have an understanding of graphs and the motivation for using knowledge graphs. But clearly not all graphs are knowledge graphs. Knowledge graphs are a specific type of graph with an emphasis on contextual understanding. Knowledge graphs are interlinked sets of facts that describe real-world entities, events, or things and their interrelations in a human- and machine-understandable format.
Knowledge graphs use an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle gives us an additional layer of organizing data (metadata) that adds connected context to support reasoning and knowledge discovery. The organizing principle makes the data itself smarter, rather than locking away the tools to understand data inside application code. In turn this both simplifies systems and encourages broad reuse.
Organizing principles, reasoning, and knowledge discovery might seem intimidating and complicated at first. But in reality, we can think of knowledge graphs as an index over data that provides curation like a skilled librarian recommending pertinent books and journals to a researcher. An organizing principle acts as a contract between the provider and user of a knowledge graph, and Chapter 2 explores the options for organizing principles that we might use.
1 Human relationships such as love are often symmetric, but we express that symmetry with two relationships.
Get Knowledge Graphs now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.