Chapter 1. Toward Holistic Metadata Management

Anything can be metadata.

There will never be one globally accepted and applied standard for metadata. Insisting on enterprise-wide acceptance of a single metadata standard before implementing metadata management is counterproductive.

Metadata repositories do not just describe data. While they describe data in data sources, metadata repositories also go beyond traditional data management by describing physical things like servers, laptops, phones, and cables, as well as non-physical things such as processes, capabilities, ideas, and intelligence.

An enterprise-wide metadata repository is impossible. No software can provide a complete view of the IT landscape. Each metadata repository offers a necessary perspective on the IT landscape, serving a specific function but not the entire view. Accepting this allows us to use metadata repositories in an impactful and strategic manner.

There are many metadata repositories in a company; most of them are monoliths. Metadata repositories typically claim to provide a complete view of the IT landscape, but they don’t. This makes them function like monoliths, each presenting a single world view and claiming to be the entire truth while ignoring each other’s perspectives.

Metadata repositories are maintained by three disciplines: data, information, and knowledge management. It is a widespread view in data management that metadata management is about managing data—even though this is technically correct, metadata repositories are used in the disciplines of data, information and knowledge management. And they all describe the IT landscape.

You need a data discovery team to organize and search metadata. This team will coordinate metadata repositories and facilitate enterprise wide searches for all metadata and across all metadata repositories.

Metadata management must ignite a third, small wave of data decentralization: the Meta Grid. Connecting metadata repositories in a meta grid enhances a robust overview of your IT landscape, by applying an architecture similar to that of microservices and data mesh. Think of the meta grid as a way of avoiding a single-view-of-the-world monolith—which metadata repositories tend to turn into. The meta grid will make metadata consumable in small containers, from the one source in which they are defined, and made usable in other metadata repositories.

In this chapter, I’ll discuss the three main points in the book; namely that the three management disciplines of data, information, and knowledge management all use metadata repositories; that these metadata repositories can be coordinated by a data discovery team and finally the meta grid architecture, that can create more logical, robust metadata management. As such, this chapter is a condensed version of the entire book, and therefore an introduction to a ebay of thinking, that can make you succeed with metadata.

Let’s begin by discussing the topic at hand: Metadata Management.

Metadata Management Reinterpreted

Metadata can be defined as:

A description that is both attached with what is described and placed somewhere else, to make what is described discoverable and manageable.

Subsequently, metadata management can be defined as:

The activity of identifying or creating, storing, searching, sharing, and ultimately deleting metadata. Metadata management is performed with metadata repositories that serves as the places to discover and manage what the metadata describes.

We will discuss and expand on these definitions and the according literature in Chapter 2.

The way metadata management is performed in many companies is closely linked to data management literature only. This has created a narrow way of performing metadata management with some fatal blind angles that will be discussed in this book. In data management literature, Metadata Management is commonly defined as the practice of managing data through a metadata layer. This interpretation is logical, given that data management is about managing…data! A notable resource advocating this perspective is the Data Management Book of Knowledge (DMBOK). In the chapter on metadata management, the DMBOK states that:

Metadata is essential to data management as well as data usage (...) All organizations produce and use a lot of data (...) but no individual will know everything about the data1

The view that metadata management is about managing data is incorporated in this book—and the role of metadata is indeed to provide an overview that no individual alone can be expected to have and maintain.

Nevertheless, this book is different from the DMBOK and most other data management literature, and quite substantially so. This book is about more than representing data at the metadata layer. It is also about the physical things that the IT landscape consists of, as well as the non-physical things that the IT landscape facilitates.

In light of this, this book is about contextualizing metadata repositories for data, information, and knowledge management to each other, as these management disciplines altogether use a plethora of metadata repositories to control the IT landscape, each with their distinct purpose.

As mentioned, metadata management—as practiced in the discipline of data management—is generally focused on the relation between data sources and the metadata repository. In this line of thinking, metadata management is about learning to represent data as metadata, as displayed in Figure 1-1.

The traditional approach to metadata management in data management.
Figure 1-1. The traditional approach to metadata management in data management.

The traditional approach to metadata management focuses on representing data sources in a metadata repository. Consequently, there are dense, scholarly discussions on the many ideal standards for metadata, aiming to define the perfect way of representing data as metadata, as stated in the ISKO encyclopedia:

Metadata standards are commonly organized around a set of elements (such as “title”, “author”, “date”) that manifest as computer-readable documents in one of an alphabet-soup set of formats and mark-up languages, such as MARC, XML, JSON, and YAML.2

The list of metadata standards is long.3 This book will not make an attempt on a new standard for metadata—that is not needed.

In the established data management practice of metadata management, there are also technological discussions about the most effective ways to provide a metadata overview of the data sources in a company. Many metadata repositories come with fixed metamodels, meaning predefined relational structures of how data should be organized at the metadata layer (we discuss metamodels in Part II + III). However, a powerful alternative exists to fixed metamodels: The knowledge graph powered metadata repository. This is depicted in Figure 1-2.

The knowledge graph powered metadata repository  likely a data catalog .
Figure 1-2. The knowledge graph powered metadata repository (likely a data catalog).

Metadata repositories powered by knowledge graphs will typically be data catalogs, and I describe them in my book The Enterprise Data Catalog. They are very useful, their flexibility makes them scalable and allow them to adjust to your IT landscape. Use them!

A knowledge graph allows you to link and expose data sources. And in such cases, data sources are typically relational data sources, meaning structured data, stored in tables and schemas, in databases. With a knowledge graph, you would e.g. be capable of connecting related tables and columns to each other. A table named CUST could have a column named CUST_ID and another table named CSTMR with a column named C-NUM. CUST and CSTMR can be connected to the business concept Customer and CUST_ID is equivalent to customer identifier and C-NUM is equivalent to Customer Number. This would, at the metadata layer, in a graph, give us a more complete picture of a customer altogether.4

There are many excellent books written on this subject. For an introduction, take a look at Chapter 8 in Building Knowledge Graphs.5 This is indeed metadata management! However, this book is different. I’ll aim to explain how to coordinate all metadata repositories in a company, instead of discussing the relation between the data layer and the metadata layer.

All metadata repositories build an understanding of the IT landscape into themselves. At several levels, both in the aforementioned metamodels, but also in naming conventions for applications, processes, integrations etc.—and also in the definitions of these. However, as you will see later on, almost all metadata repositories are built in such a way that they promote a single view of the IT landscape—their own—as depicted in Figure 1-3:

Metadata repositories have single views of the IT landscape.
Figure 1-3. Metadata repositories have single views of the IT landscape.

The Single view of the IT landscape creates a significant problem for metadata management because companies do not have one but several metadata repositories. And they all have their distinct view of the IT landscape—that most often do not match. As depicted in Figure 1-4, this becomes a problem because the metadata repositories manage overlapping parts of the IT landscape. And, as these repositories do not match, the truth about the IT landscape dissolves. Most importantly, the truth dissolves at a scale so massive that the verification of the actual state of it is impossible to perform.

Note

Can you remember the weird chemist from the preface? The guy with thick glasses, the satanic smile, and the machine gun laugh? You got it—this is what makes him laugh.

This reality is the reason behind the reinterpretation of metadata management put forward in this book. This book is about the coordination of metadata repositories, more than the representation of data, in the metadata layer in one given technology, e.g. a graph. This is a balance because obviously, what is discussed is—technically—still data and metadata. But as such, the “data layer” in this book only consists of metadata from metadata repositories. Lists of applications, process maps, capabilities, project names, etc., etc. stored in metadata repositories used in decision making of substantial innovative, operational, financial, and protective importance.

Companies at large suffer from poor implementations of these metadata repositories. Furthermore, the same tasks of mapping the IT landscape in these repositories are repeated again and again—most times in vain. This is what Fundamentals of Metadata Management wants to change. Accordingly, what is discussed in this book is the actual reality in companies—and not an ideal discussion—in short, this book discusses which is the right part of Figure 1-4. The reality of many metadata repositories, that to a certain degree overlap, but far too often are not coordinated and thus contains multiple truths about the IT landscape. This reality can be changed by understanding and coordinating metadata repositories. And this will create a deeper, more holistic view of the IT landscape.

This book deals with the reality of multiple metadata repositories  to the right.
Figure 1-4. This book deals with the reality of multiple metadata repositories, to the right.

Thus, metadata management must be reinterpreted in three ways:

  • Metadata repositories are for Data, Information, and Knowledge Management

  • Metadata repositories must be coordinated by a Data Discovery Team

  • Metadata repositories should be connected in a Meta Grid

Take a look at the table of contents. Besides Chapter 1, the book has these three bullets as its three parts. It’s basically the topic of the book.

Metadata repositories are for Data, Information, and Knowledge management. This book exceeds the typical view of metadata management as being about managing only data. Instead, what I argue in this book is that metadata is about data, information and knowledge management. Technically, metadata can represent data in databases connected to applications, but metadata can also represent physical and non-physical aspects of the IT landscape. These physical and non-physical things are managed in disciplines found in enterprises that describe the IT landscape. Accordingly, this book builds on the reality that metadata management is rooted not only in the discipline of data management but also in information management and knowledge management. These three well-established management disciplines have distinct metadata repositories that create various overviews of your company’s IT landscape. It is a fact that metadata repositories for information management and knowledge management exist, but surprisingly, they have not been taken into account when setting up holistic metadata management practices in companies so far. This is because metadata management as a discipline is very closely connected to data management—and this is one of the blind spots in metadata management. The discipline of metadata management often overlooks many repositories that should be included because it originated from data management rather than emerging as a separate discipline dedicated to employing metadata management methodologies to map the IT landscape. We will discuss metadata repositories for data, information and knowledge in Part I, that is chapter 3-5.

Note

This is why I was motivated to write this book. My academic background is in Library and Information Science (LIS), and metadata is naturally discussed as completely detached from data in LIS. Instead, metadata describes physical objects such as books and captured wildlife. I’ll discuss this in more depth in Chapter 2.

Metadata repositories must be coordinated by a Data Discovery Team. Metadata management is not a task that can be carried out in isolation. However, that is the case today, in most companies. This results in a reality with multiple, uncoordinated metadata repositories that are not connected and therefore depict the same IT landscape in various ways. This is not only expensive in terms of time lost doing the same analysis of the IT landscape again and again. It also produces multiple realities of the IT landscape, leading to wrong decisions and waste of substantial amounts of money as technologies are bought, maintained, and preserved without firm knowledge of the necessity of doing so. The Data Discovery Team can change this because it will map not only the IT landscape but also map the metadata repositories that map the IT landscape. In doing so, the Data Discovery Team can coordinate the representation of the IT landscape across all metadata repositories. Furthermore, the Data Discovery Team will allow for a hitherto unseen powerful enterprise wide search—across all metadata repositories for data, information, and knowledge. We will discuss the Data Discovery Team further in Chapters 6 and 7 .

Metadata repositories should be connected in a Meta Grid. This book also differs from the usual understanding of metadata management in the sense that it is not primarily focused on perfecting the metadata representation of the IT landscape in one or a couple of metadata repositories by the usage of standards and technologies. Instead, what is at the center of it is the interplay between the plethora of metadata repositories that exist in every company of more than a few thousand employees. This requires a brief detour to explain: Microservices and Data Mesh are ways of thinking IT that liberates technological capabilities out of big, unmanageable solutions, that can be thought of as monoliths. Instead, microservices and data mesh establishes the smallest possible units of operational and analytical data in order to create fast flow, flexibility and transparency. Altogether, this kind of thinking enables companies to reinvent themselves and increase their competitive advantages. I’ll argue that just like microservices and data mesh suggests abandoning centralized monoliths, so can metadata currently be considered managed in centralized monoliths. The silos of metadata, as such opaque monoliths, create, when seen as a whole, a cacophony of opposing realities about the IT landscape that is dysfunctional. I’ll explain how to counter that reality and get a deeper understanding of a company’s IT landscape through a new way of performing metadata management, in a structure that, like microservices architecture and data mesh, connects small units of data—in this case, metadata—as products and metadata products. This proposed structure is aMeta Grid. It’s not a 1:1 reality with data mesh for analytical data, nor with microservices for operational data. The Meta Grid aims to help you succeed with a smooth, simple, and robust representation of your IT landscape across all the metadata repositories in your company. We will discuss the Meta Grid in Chapters 8 through 11.

Let’s unfold these points even more. Let’s look at the management disciplines that use metadata repositories to describe the IT landscape.

Data, Information, and Knowledge Management

Metadata repositories are used in large enterprises, in three distinct management disciplines:

  • data management

  • information management

  • knowledge management

These are all requested for regulatory, financial, operational, and innovative purposes. In short, Metadata management for data, information, and knowledge exists, and it is not up for discussion if companies should perform them in these three management disciplines. It is already being done today all around the world. Besides the literature inside these three management disciplines (discussed below), you’ll also find this distinction in e.g., Enterprise Architecture literature.6

Before we dive into a description of the management disciplines of data, information, and knowledge, there are two typical assumptions that are described in this book:

  1. This is not a book that seeks to explain and distinguish Data, Information, and Knowledge in elaborate, intellectual discussions. It is not the point of this book, as it is a simple fact that established management disciplines for these three concepts exist, complete with their metadata repositories. This is what is discussed in this book: it’s not a philosophical exploration of data, information and knowledge as concepts.

  2. This is not a book that seeks to define and prove a causality from data to information to knowledge, and beyond. Often you see models of causality from data over information to knowledge. The most widespread model is the Data-Information-Knowledge-Wisdom (DIKW) pyramid illustrated in Figure 1-5.

The DIKW pyramid  Data  Information  Knowledge  Wisdom.
Figure 1-5. The DIKW pyramid, Data, Information, Knowledge, Wisdom.

The DIKW suggests a human interpretation of raw, objective data into information which then becomes knowledge and finally wisdom. This model is widely documented in a vast body of literature, supporting the idea that analytical insights can be gained from data.7

A little less known is the Data-Information-Knowledge-Action-Reaction (DIKAR)8 model illustratedin Figure 1-6.

The DIKAR model  Data  Information  Knowledge  Action  Reaction.
Figure 1-6. The DIKAR model, Data, Information, Knowledge, Action, Reaction.

The DIKAR model is a bit more pragmatic than the DIKW, in the sense that the end goal is not wisdom but simply a push-pull of actions and reactions.

Both models suggest a causality where data leads to knowledge and it is correct that this causality exists. However, it is overly data- and technology centric to believe that information, knowledge, wisdom, and actions/reactions are always rooted in data. This perspective defies the reality of human perception: Knowledge can be obtained just as easily through reflections on general human experience as it can be derived from data.

Therefore, this book acknowledges the causality in data-information-knowledge, but not as the single system to obtain knowledge. That would be scientifically naive. And more importantly: It distorts the reasons why management for data, information, and knowledge exists—they have not been established to support this causality, but to manage vital aspects of the IT landscape. Even more important, what I argue, is that these management disciplines have evolved in isolation from each other, which leads to organizational confusion and multiple opposing depictions of the IT landscape all together.

In sum, this book does not engage in a lengthy philosophical discussion on data, information and knowledge, nor does it seek to prove a causality from data over information to knowledge. Instead, this book relies on the fact that the management disciplines for data, information, and knowledge exist and that they all have a subset of repositories that is found in most companies.

So how are these three distinct management disciplines used in metadata repositories?

Data management uses a set of technologies to manage data. A subpart of these technologies are used to store, extract, transform, observe, and ingest data across the IT landscape. These contain small metadata repositories within them, without necessarily being metadata repositories primarily (we will discuss the subtle distinction in Chapter 2). This subpart of the data management toolset was, in the late 2010s and early 2020s, referred to as the modern data stack. However, the term modern data stack is declining in usage and has been declared dead.9 A simple reason may be that the modern data stack was too economically expensive and delivered too little value. Nevertheless, data management also relies on other metadata repositories, e.g. to perform strategic planning of enterprise architecture, maintaining the existing IT infrastructure, and preserving the immediate past in backup systems. Several maps of data management technologies (including metadata repositories) exist. The most exhaustive and well known is the Machine Learning, Artificial Intelligence & Data (MAD) Landscape, by Matt Tuck. Data management is a well-established discipline organized around the Global Data Management Community (DAMA), that also publishes the aforementioned DAMA DMBOK.

Information management is smaller in size as a discipline compared to data management. It is also closer to less technologically complex storage solutions. Besides data, it is focused on non-digital, physical objects, like paper and specimens, as well as concepts that are more abstract than tangible, such as business processes and capabilities. Rather than one big community, information management is divided into several subdisciplines, that comprise of:

And many more. As a discipline, Information Management consists of the interpretation of data (from the IT landscape) into larger chunks more understandable by humans, e.g. as records. Information Management also executes more practical dimensions of knowledge management, in terms of e.g. quality assurance. The metadata repositories used for information management are to a large extent of regulatory and operational purpose, more than innovation. Information Management has a vast body of theoretical literature supporting it.10

You may wonder if information management really differs from data management. That is fair, but as mentioned in the warning above, that is a philosophical discussion. In our case, you must keep in mind that Information Management is practiced as a discipline, with standards, technologies, and regulatory responsibilities. It exists—if you ignore it, you lose that capacity to fully understand your IT landscape.

Note

Information Management technologies are usually depicted in RegTech maps, or as part of GovTech maps.

Knowledge management is the methodology to capture knowledge stored in human minds. As such, it is considered a subpart of Human Resources in the ISO organization with its own working group, ISO/TC 260/WG 6 Knowledge Management. The task at hand is to capture and store more permanently the knowledge that humans accumulate. Subsequently, that knowledge is stored in knowledge management technologies containing metadata repository dimensions. These knowledge management technologies can disseminate and teach that knowledge to more people. Just like for data and information, knowledge management operates with a set of technologies—however, no single technology map lists them. The best overview is provided by Ed Tech Maps.

Unlike the distinction between data and information, philosophy is relevant in the context of knowledge management. To capture knowledge, we must know what to look for, and philosophy provides answers to this. From the ancient philosopher Aristotle, we have inherited three types of knowledge in the Nicomachean Ethics:11

  • Episteme is scientific, theoretical knowledge. It’s knowledge that requires thinking.

  • Techne is the knowledge of practical arts and crafts. Knowledge that requires actions and has physical outcomes.

  • Fronesis is the knowledge of ethical considerations dedicated to judging good actions.

As you will see, these types of knowledge hold universal truths and are directly traceable to the knowledge management technologies implemented in companies in our era.

Furthermore, types of knowledge can be subdivided into subtle categories such as Explicit, Implicit, Tacit, Procedural, Declarative, A Posteriori, and A Priori knowledge. The last two refer, respectively, to knowledge obtained from personal experience and the capability to think abstractly. Knowledge management technologies directly adapt these philosophical distinctions to deliver functional software, such as GURU.

Stephanie Barnes has put forward an avant-garde approach to knowledge management, known as Radical Knowledge Management, that seeks to capture knowledge through art-based interventions—the approach is powerful, as knowledge is difficult to capture12

In almost all companies, the three management disciplines of data, information, and knowledge operate in isolation from each other. This has severe consequences, as they all depict, to a certain degree, the IT landscape upon which they rely to perform their tasks.

But there is a solution to bring them together—a new kind of team.

The Data Discovery Team

Readers of my previous book, The Enterprise Data Catalog, and my newsletter, Symphony of Search, will know that I approach technology by suggesting new perspectives based on my academic background inLibrary and Information Science.13 That is also true for this book.

Accordingly, pivotal for this book is the notion of the reference librarian. The reference librarian is a function that consolidates a topic based on many sources. This person is responsible for providing the deepest, most complete answer to any information needs that library users may have. The reference librarian will assess a variety of openly listed sources to gather an answer to even very big information needs—this discipline is especially important in an education and research context.14 They push researchers and students forward on complex endeavors with an opaque amount of data that has to be filtered to provide the most precise context. As you can see in Figure 1-7, this approach differs from the “monolithic” approach of having “one single source of truth” that can answer all questions.

The reference librarian will use many sources to create a complete answer.
Figure 1-7. The reference librarian will use many sources to create a complete answer.
Note

My idea of coordinating metadata repositories is naturally inspired by LIS literature. However, the management disciplines that help control data, information, and knowledge play an equally important role.

The reality of asking questions and getting answers from one or many sources is also at play for metadata repositories. Let’s take the example of an application. We could imagine a simple question like this one:

“What is an application and what applications do we have in this company?”

If you ask the team of enterprise architects, they have a firm answer about what defines an application. Along with the definition of application, they also have a complete list of applications in their Enterprise Architecture Management (EAM) tool—their metadata repository and their single source of truth `out the IT landscape.

However—does the fact in the EAM match the reality of other metadata repositories? What would have happened if we had asked another team about what an application is and which applications we have—and that team then used their metadata repository to answer us? Can we expect that these metadata repositories match exactly? Unfortunately, that is seldom the case, as metadata repositories are built and maintained in silos by teams with no or little communication all over the company.

To address this challenge, you should respond like a reference librarian. You need to coordinate metadata repositories to provide solid answers (Figure 1-8).

Providing a solid answer for Application based on multiple metadata repositories.
Figure 1-8. Providing a solid answer for Application based on multiple metadata repositories.

Think of these repositories as part of a grid, all connected so they constitute a collective truth and not isolated, opposing truths. Such a reality is, in a nutshell, what this book seeks to establish for your company IT landscape. We’ll discuss this further in chapters 3 through 5.

You need to understand that metadata repositories are normally managed as monoliths, so let’s unfold that—and have a first glimpse of a new, decentralized architecture for metadata management.

The Meta Grid—the third wave of data decentralization

The DAMA DMBOK displays two architecture diagrams for metadata management (Figure 1-9):15

The two styles for Metadata Management in the DAMA DMBOK.  TO BE REDRAWN
Figure 1-9. The two styles for Metadata Management in the DAMA DMBOK. (TO BE REDRAWN)

Both diagrams include a metadata portal that allows searching for metadata across the typical subset of metadata repositories discussed in data management. This can be done either directly within the metadata repositories or through a middle layer known asan Enterprise Metadata Repository.

However, both architectures have problems. The Centralized Metadata Architecture suggests pulling all metadata into a shared repository, while the Distributed Metadata Architecture suggests searching metadata directly across both sources.

But none of them address the fact that there is an overlap of Metadata in the various repositories. This creates redundant, un-coordinated repositories. As such, this is a technical architecture, and does not take into account a crucial aspect of how metadata is managed.

As such, each metadata repository can be considered a monolith (more on this in Chapter 6), not related to each other, and in the case of the enterprise wide metadata repository, this also represents a new monolith on top. This means that metadata shared across repositories is not considered an independent entity; it is only viewed in the context of the repository in which it is represented. Instead, the same metadata is listed again and again, almost always misaligned and imperfect.

Accordingly, this book suggests a decentralizing metadata into a meta grid. To understand this concept, we need to briefly review the data decentralizations strategies proposed over the past few decades.

Learning from hexagonal architectures toward a Meta Grid

The past decades have seen two waves of decentralization that successfully managed to break up gigantic data monoliths inside companies. These two waves were microservices and data mesh.

As discussed, I proposed a new approach to data decentralization, which I call the Meta Grid. Unlike previous methods like microservices and data mesh, the Meta Grid aims to break down large data systems into smaller, more manageable pieces. Let’s take a brief look at these two earlier waves to better understand the concept of the Meta Grid.

Microservices break up the monoliths of operational data. Operational data describes the data that runs companies, making sure that the value chain of the company is functioning smoothly. It traditionally sits in big technology components, like Enterprise Resource Planning (ERP) systems, Customer Relationship Management (CRM) systems, Product Information Management (PIM) systems, and Contents Management Systems (CMS). You can see them depicted in Figure 1-10.

Big technology components for operational data .
Figure 1-10. Big technology components for operational data .

The architecture depicted in Figure 1-10 faced challenges from emerging software styles, notably introduced as early as 2001 in the Agile Manifesto.16 These styles eventually evolved into what is now known as microservices.17 This shift was prompted by the recognition that large technology components lacked the necessary speed and agility for companies to evolve, scale, and adapt to change effectively. This results in a “locked in” syndrome where companies become so dependent on the technologies running their value chain that they can no longer modify that very same value chain, causing a gradual loss of competitiveness as reality changes.

Microservices was successfully put forward as an alternative that breaks up these big components—monoliths—into the smallest possible services, service meaning technology that is capable of performing an action. This architecture is referred to as hexagonal because each service is packaged as a product and visualized as a box (Figure 1-11).

Microservices for operational data.
Figure 1-11. Microservices for operational data.

Microservice architecture is completely flexible, with a plasticity that allows for speed, scale, and reorientation. Amazon, Netflix, and Uber run on microservice architecture (indeed, Amazon pioneered microservices). This makes it possible for these companies to always adapt to customer behaviors and needs while adding new services to their platforms. And eventually, the microservice architecture inspired a new movement: Data Mesh.

Data Mesh breaks up the monoliths for analytical data. Learning from microservices, data mesh was put forward as a vision in the late 2010s. Data mesh proposed an architecture of decoupled, small units of analytical data. Contrary to operational data, analytical data does not run a company. Instead, it reflects on the company and innovates via analytical use cases driven by Machine Learning (ML) and Artificial Intelligence (AI). Traditionally, analytical data has been stored, consumed, and exposed in big, rather complex data platform technologies. These have typically been:

  • Data warehouses for Business Intelligence (BI)

  • Data lakes for BI, ML and AI

  • Data lake houses also for BI, ML and AI along with governance features18

As the need for analytical data increased during the last 20 years, the data platforms in Figure 1-12 became bottlenecks.

Centralized data platforms for analytical data become bottlenecks.
Figure 1-12. Centralized data platforms for analytical data become bottlenecks.

The inconveniences of the architecture in Figure 1-12 are manifold. First and foremost, its reality mimics operational data in Figure 1-10, meaning that analytical data is “locked in” in a data platform that lacks the capacity to scale and reorient itself while operating very slowly. The centralized data platform cannot keep up with increasing demand for analytical data—providers can’t get their data into the platform fast enough, in good enough quality, and consumers can’t use and access data accordingly.

Data Mesh was proposed as an alternative, advocating for a similar architecture for analytical data as what microservices offer for operational data.19 The centralized data platform is broken up into business domains with data products, as visualized in Figure 1-13.

Data mesh for analytical data.
Figure 1-13. Data mesh for analytical data.

A data mesh architecture will scale faster than a centralized architecture and express each domain more clearly, as the lack of plurality in the centralized solution is avoided—a data mesh has multiple, domain-specific models, not one central canonical data model.

With the Meta Grid I suggest stitching metadata from various metadata repositories together, as a grid, that will make metadata management more robust, and subsequently let you succeed with data-, information-, and knowledge management altogether.

A metadata repository is a mirror of your company’s IT landscape. I’ll explain and discuss them in depth in Chapter 2. Metadata repositories have monolithic tendencies in the sense that they all claim that they—and they alone—represent the entire truth about the IT landscape. However, none of them do. They all represent a certain vision of the IT landscape. The way I suggest a decentralization of metadata is different from microservices and data mesh (that also differ from each other). The monolithic tendency in metadata repositories is in fact an illusion—the illusion of the total view of the IT Landscape that every metadata repository is prone to impress on you. With this book, I aim to dispel that illusion and, instead create a more methodological and holistic approach to metadata.

In Figure 1-14, you can see a depiction of metadata repositories as both centralized tools and at the same time, silos of metadata that are not coordinated.

Metadata repositories for metadata.
Figure 1-14. Metadata repositories for metadata.

Similar to microservices and data mesh, which employ comparable thinking and architectural patterns for distinct purposes—executing the value chain and creating analytical use cases—so too does the Meta Grid follow a similar approach and architecture, albeit with its own distinct purpose.

Creating a robust, coordinated overview of the IT landscape across all metadata repositories allows you to succeed with data, information, and knowledge management. The meta grid inscribed in this context is shown in Figure 1-15.

The Meta Grid for metadata is the third wave of data decentralization.
Figure 1-15. The Meta Grid for metadata is the third wave of data decentralization.

Notice that all metadata repositories will be placed in either a data, information, or knowledge management domain. The three management disciplines are the only three domains in the Meta Grid and have distinct metadata repositories.

You can see a depiction of the Meta Grid in Figure 1-16, with a metadata container of Sensitive Data Types used in a unified manner in repositories across metadata repositories in data, information, and knowledge management.

The Meta Grid  sensitive data types used as example.
Figure 1-16. The Meta Grid, sensitive data types used as example.

The Meta Grid architecture will be detailed in Part III of the book. Briefly, what is depicted in Figure 1-15 is a decentralized architecture that enables strategic management of metadata repositories. This approach saves time and money while offering a more comprehensive and robust overview compared to less deliberate metadata management methods.

Additionally, it’s important to note that this book doesn’t encompass every single metadata repository in existence. Instead, consider the Meta Grid as something that can expand continuously to accommodate more repositories. I’ll delve into a significant subset of metadata repositories in chapters 3, 4, and 5—covering data management in Chapter 3, information management in Chapter 4, and knowledge management in Chapter 5. From Chapter 8 onwards, I’ll discuss how to establish a Meta Grid architecture, including my proposal for a Data Discovery Team.

Setting up microservices and data mesh architectures can be difficult, with deep technical exercises and theoretical discussions. In comparison, a meta-grid architecture is relatively simple and easy to build, but the organizational change aspect may be substantial.

Summary

In this chapter we went through the entire content of the book in a condensed, introductory way. The most important thing to note from this chapter, is that metadata management must be understood as a way to coordinate the many existing metadata repositories in companies. Subsequently, we discussed the definitions of metadata, metadata management and metadata repositories. I also explained that there are metadata repositories for data, information and knowledge management. Further, I introduced the Data Discovery Team, that will be able to play a key role for your company. Finally, I put forward the idea of a Meta Grid which will make the task of metadata management more complete and allow you to get a more thorough depiction of your IT landscape. Here are the specific takeaways:

  • Metadata management has a set of fundamentals. These are:

    • Anything can be metadata.

    • There will never be one globally accepted, applied standard for metadata.

    • Metadata repositories do not only describe data.

    • An enterprise-wide metadata repository is an impossibility.

    • There are many metadata repositories in a company—they are mostly monoliths.

    • Metadata repositories are maintained by three management disciplines: data, information, and knowledge management.

    • You need a data discovery team to organize and search metadata.

    • Metadata management must ignite a third, small wave of data decentralization: The Meta Grid.

  • Metadata Management must be reinterpreted to expand beyond traditional data management activities, as this does not reflect the realities in companies.

  • Information Management and Knowledge Management also depict the IT landscape in various metadata repositories.

  • The Data Discovery Team is inspired by reference librarians.

  • Reference librarians make use of not one, but a series of sources to provide complete answers to complex information needs.

  • Likewise, the data discovery team trains in providing answers about the IT landscape from not one but many sources to give as complete answers as possible.

  • The Meta Grid is a third wave of decentralization, following in the footsteps of microservices and data mesh.

  • The Meta Grid is smaller and more simple than microservices and data mesh.

1 DAMA, Data Management Book of Knowledge, Second Edition, Revised (Basking Ridge: Technics Publications, 2024) 395

2 “Metadata”, Encyclopedia of Knowledge Organization, ISKO, last modified 2020-03-16, https://www.isko.org/cyclo/metadata

3 “Metadata Standard” Wikipedia, last modifed 2024-04-18, https://en.wikipedia.org/wiki/Metadata_standard

4 I am thankful to John O’Gorman for challenging me particularly on this from the very earliest release of my manuscript. Discussing this and many things after this improved the communication of my thinking substantially.

5 J. Barrassa, J. Webber, Building Knowledge Graphs, (Sebastopol: O’Reilly, 2023) Chapter 8

6 Rémy Fannader, Enterprise Architecture Fundamentals (Salt Lake City: Izzard Ink, 2021)

7 “DIKW Pyramid”, Wikipedia, Last updated 2024-05-14, https://en.wikipedia.org/wiki/DIKW_pyramid

8 N. Venkatraman, “Managing IT Resources as a Value Center,” IS Executive Seminar Series, Cranfield School of Management (1996).

9 Joe Reis, Everything Ends—My Journey with the Modern Data Stack, 2024-02-17, https://joereis.substack.com/p/everything-ends-my-journey-with-the

10 A literature review of Information Management can be found in: Anuj Sharma, Nripendra P. Rana, Robin Nunkoo, “Fifty years of information management research: A conceptual structure analysis using structural topic modeling,” International Journal of Information Management, 58 (2021)
https://www.sciencedirect.com/science/article/abs/pii/S0268401221000098

11 Aristotle, “Nicomachean Ethics,” Translated by Adam Beresford. (London: Penguin Classics, 2020)

12 Stephanie Barnes: “The Essentials of Radical KM,” Das Kuratierte Dossier, Vol. 5 “Knowledge Management Essentials,” Gesellschaft für Wissensmanagement e. V. (March 2023) https://www.gfwm.de/dossier-kmessentials-radicalkm/

13 Ole Olesen-Bagneux, The Enterprise Data Catalog—Improve Data Discovery, Ensure Data Governance, and Enable Innovation, (Sebastopol, CA: O’Reilly, 2023) and my LinkedIn newsletter: Symphony of Search

14 Taylor & Francis: The Reference Librarian

15 DAMA, Data Management Book of Knowledge, Second Edition, Revised (Basking Ridge: Technics Publications, 2024), 406-407

16 2001: https://agilemanifesto.org/

17 Some essential sources on microservices are: Martin Fowler, “Microservices—a definition of this new architectural term,” 2014-03-25: https://martinfowler.com/articles/microservices.html; Sam Newman, Building Microservices—Designing Fine-Grained Systems, 2nd edition (Sebastopol, CA: O’Reilly, 2021); Guillaume Bodet, Scrum en Action, (Paris: Pearson, 2012)

18 Zhamak Dehghani: “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh,” 2019-05-20, https://martinfowler.com/articles/data-monolith-to-mesh.html; Zhamak Dehghani: “Data Mesh Principles and Logical Architecture,” 2020-12-03, https://martinfowler.com/articles/data-mesh-principles.html

19 Two books were published simultaneously on this topic Zhamak Dehghani, Data Mesh—Delivering Data-Driven Value at Scale (Sebastopol, CA: O’Reilly, 2022); Piethein Strengholt, Data Management at Scale—Modern Data Architecture with Data Mesh and Data Fabric, 2nd edition (Sebastopol, CA: O’Reilly, 2023) Data Mesh and Data Management at Scale (first edition in 2020)

Get Fundamentals of Metadata Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.