Introduction

Database management systems (DBMS) have been around for a long time, and each of us has a set of preconceived notions about what they are, and what they can be. These preconceptions vary depending on when we started our careers, whether we lived through the shift from hierarchical to relational databases, and if we have gained exposure to NoSQL yet. Our understanding of databases also varies depending on which areas of information technology we work in, ranging from transactional processing to web apps, to business intelligence (BI) and analytics.

For example, those of us who started in the mainframe COBOL era understand hierarchical tree-structures and processing flat files whose structures are defined inside of a COBOL program. Curiously, many of us who have adopted cutting-edge NoSQL databases have some understanding of hierarchical tree structures. Working on almost any system during the relational era ensures knowledge of SQL and relational data modeling around rows, columns, keys, and joins. A more rarified group of us know ontology modeling, Resource Description Framework (RDF), and semantic or graph-based databases.

Each of these database¹ types has its own, unique advantages. As data continues to grow in volume and variety, so, too, does our need to utilize this variety of formats and databases—and often to link the various data stores together using extract, transform, and load (ETL) jobs and data transformations.

Unfortunately, each new data store selected becomes a “technical silo”—a new data store with boundaries between them that are both physical, because the data is stored in different places, and conceptual, because the data is stored in fundamentally different forms. Relational and non- (or not-only) relational (NoSQL) databases are different from each other, and different from graph databases, and different from other stores.

Until recently, this forced a difficult choice. Choose the relational model or the document model or graph type models; scale up or scale out; perform analytical or transactional work; or choose a few and cobble them together with ETL jobs and integration code.

Fortunately, the DBMS landscape is evolving rapidly. What organizations really want is a way to use all their data in an integrated way, so why shouldn’t database products support this out of the box? Integrated data storage and access—across data types and functions—is exactly the goal of multi-model database management platforms.

A multi-model database supports multiple data models in their natural form within a single, integrated backend, and uses data standards and query standards appropriate to each model. Queries are extended or combined to provide seamless query across all the supported data models. Indexing, parsing, and processing standards appropriate to the data model are included in the core database product.

This definition illustrates that simply storing various data types—as one can do in relational database management systems (or RDBMS) binary large object (or BLOB) or a filesystem directory—does not a multi-model database make. The true multi-model database can do the following:

Index data in natural ways for the models supported
Parse and index the inherent structure in self-describing data formats such as JSON, XML, and RDF
Implement standards such as query languages and validation or schema definition languages for the models supported
Provide integrated APIs that not only query the individual data models, but also query across multiple data models
Implement data processing languages native to each supported data model

Provided these capabilities, a multi-model database does not require you to define the shape or schema of the data before loading it; instead, it uses the inherent structure in the data being stored. This makes data management flexible and adaptive, able to respond to the needs of downstream applications and changing business requirements.

With this understanding of what a multi-model database is, we can move on to what a multi-model database is for and describe use cases. That said, any system that stores and accesses different types of data will benefit from a multi-model database. An enterprise or complex use case involving many existing data systems will naturally encounter many different data formats, so we will focus on data integration/silo-busting as a key use case. Another scenario is the integration of structured data handling with unstructured or semi-structured data. This often has been addressed by standing up a relational or NoSQL database and manually integrating it with a search platform but can be included in one multi-model database. We will also focus on a particular multi-model combination of documents with graph structures, which is a natural model for many domains with interrelated business entities.

Some Terms You’ll Need to Know

Table P-1 provides definitions to some terms that will come up frequently in this book.

Table P-1. Key terms related to multi-model databases
Term	Description
Multi-model	A multi-model database supports multiple data models in their natural form within a single, integrated backend, and uses data standards and query standards appropriate to each model. Queries are extended or combined to provide seamless query across all the supported data models. Indexing, parsing, and processing standards appropriate to the data model are included in the core database product. Document, graph, relational, and key-value models are examples of data models that can be supported by a multi-model database.
Multiquery engine	A query layer that allows multiple ways to query one data model.
Query language	A language designed to identify subsets of data in a database, and often to manipulate the data as the data is retrieved through joins, subselects, or other changes. Every data model other than text has a query standard, and even text query has natural, purpose-built query syntaxes.
	Model	Query language
	XML	XQuery for query XSLT for manipulation
	JSON	JavaScript for manipulation
	RDF	SPARQL
	Relational	SQL
	Text	Search
Data indexing	All databases create one suite of indexes on data as it is ingested to allow fast query of that data. True multi-model will have one integrated suite of indexes across data models that allows a single, composable query to quickly retrieve data across all the data models, simultaneously.
Canonical model	A type of data model that presents data entities and relationships in a standardized, simple form. Also known as a common data model.
Polyglot programming	Using several programming languages within a given application.
Polyglot persistence	Using several data models for different aspects of a system or enterprise. The polyglot persistence approach is motivated by the idea that data should be stored in the format and DBMS that best fits the data stored and the functionality required. Traditionally, this meant choosing a different DBMS for each type of data and having the application communicate with the right data store. However, a true multi-model DBMS provides polyglot persistence with a single, integrated backend.
Multiproduct multi-model	A multi-model system with multi-model query languages and APIs, but which are powered by a collection of separate data stores internally. These products provide one simplified API for data access, but use a façade or orchestration layer atop multiple internal databases, which adds to complexity and can affect the databases’ consistency, redundancy, security, and scalability.
Shared nothing (SN) architecture	A distributed computing architecture in which each node is independent and self-sufficient, and there is neither a single point of contention across the system, nor a single point of failure. More specifically, none of the nodes share memory or disk storage.

Acknowledgments

A HUGE thank you to all the reviewers and contributors to this project. Thank you to Diane Burley, Damon Feldman, David Gorbet, James Kerr, Justin Makeig, Ken Krupa, Evelyn Kent, and Derek Laufenberg for all of your above-and-beyond contributions. This report would not be possible without all of your keen, discerning eyes and extraordinary additions. Thank you also to Parker Aven, my constant inspiration. Next weekend it’s just you, me, Legos, and movies, sweetheart!

¹ For simplicity, we will sometimes blur the line between a “database” and a “database management system” and use the simpler term “database” where convenient.

Get Building on Multi-Model Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Building on Multi-Model Databases by Pete Aven

Introduction

Some Terms You’ll Need to Know

Acknowledgments

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly