Introduction

Database management systems (DBMS) have been around for a long time, and each of us has a set of preconceived notions about what they are, and what they can be. These preconceptions vary depending on when we started our careers, whether we lived through the shift from hierarchical to relational databases, and if we have gained exposure to NoSQL yet. Our understanding of databases also varies depending on which areas of information technology we work in, ranging from transactional processing to web apps, to business intelligence (BI) and analytics.

For example, those of us who started in the mainframe COBOL era understand hierarchical tree-structures and processing flat files whose structures are defined inside of a COBOL program. Curiously, many of us who have adopted cutting-edge NoSQL databases have some understanding of hierarchical tree structures. Working on almost any system during the relational era ensures knowledge of SQL and relational data modeling around rows, columns, keys, and joins. A more rarified group of us know ontology modeling, Resource Description Framework (RDF), and semantic or graph-based databases.

Each of these database1 types has its own, unique advantages. As data continues to grow in volume and variety, so, too, does our need to utilize this variety of formats and databases—and often to link the various data stores together using extract, transform, and load (ETL) jobs and data transformations.

Unfortunately, each new data store selected becomes a “technical silo”—a new data store with boundaries between them that are both physical, because the data is stored in different places, and conceptual, because the data is stored in fundamentally different forms. Relational and non- (or not-only) relational (NoSQL) databases are different from each other, and different from graph databases, and different from other stores.

Until recently, this forced a difficult choice. Choose the relational model or the document model or graph type models; scale up or scale out; perform analytical or transactional work; or choose a few and cobble them together with ETL jobs and integration code.

Fortunately, the DBMS landscape is evolving rapidly. What organizations really want is a way to use all their data in an integrated way, so why shouldn’t database products support this out of the box? Integrated data storage and access—across data types and functions—is exactly the goal of multi-model database management platforms.

A multi-model database supports multiple data models in their natural form within a single, integrated backend, and uses data standards and query standards appropriate to each model. Queries are extended or combined to provide seamless query across all the supported data models. Indexing, parsing, and processing standards appropriate to the data model are included in the core database product.

This definition illustrates that simply storing various data types—as one can do in relational database management systems (or RDBMS) binary large object (or BLOB) or a filesystem directory—does not a multi-model database make. The true multi-model database can do the following:

  • Index data in natural ways for the models supported

  • Parse and index the inherent structure in self-describing data formats such as JSON, XML, and RDF

  • Implement standards such as query languages and validation or schema definition languages for the models supported

  • Provide integrated APIs that not only query the individual data models, but also query across multiple data models

  • Implement data processing languages native to each supported data model

Provided these capabilities, a multi-model database does not require you to define the shape or schema of the data before loading it; instead, it uses the inherent structure in the data being stored. This makes data management flexible and adaptive, able to respond to the needs of downstream applications and changing business requirements.

With this understanding of what a multi-model database is, we can move on to what a multi-model database is for and describe use cases. That said, any system that stores and accesses different types of data will benefit from a multi-model database. An enterprise or complex use case involving many existing data systems will naturally encounter many different data formats, so we will focus on data integration/silo-busting as a key use case. Another scenario is the integration of structured data handling with unstructured or semi-structured data. This often has been addressed by standing up a relational or NoSQL database and manually integrating it with a search platform but can be included in one multi-model database. We will also focus on a particular multi-model combination of documents with graph structures, which is a natural model for many domains with interrelated business entities.

Some Terms You’ll Need to Know

Table P-1 provides definitions to some terms that will come up frequently in this book.

Acknowledgments

A HUGE thank you to all the reviewers and contributors to this project. Thank you to Diane Burley, Damon Feldman, David Gorbet, James Kerr, Justin Makeig, Ken Krupa, Evelyn Kent, and Derek Laufenberg for all of your above-and-beyond contributions. This report would not be possible without all of your keen, discerning eyes and extraordinary additions. Thank you also to Parker Aven, my constant inspiration. Next weekend it’s just you, me, Legos, and movies, sweetheart!

1 For simplicity, we will sometimes blur the line between a “database” and a “database management system” and use the simpler term “database” where convenient.

Get Building on Multi-Model Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.