Chapter 4. Principle of the Self-Serve Data Platform

Simplicity is about subtracting the obvious and adding the meaningful.

John Maeda

So far I have offered two fundamental shifts toward data mesh: a distributed data architecture and ownership model oriented around business domains, and data shared as a usable and valuable product. Over time, these two seemingly simple and rather intuitive shifts can have undesired consequences: duplication of efforts in each domain, increased cost of operation, and likely large-scale inconsistencies and incompatibilities across domains.

Expecting domain engineering teams to own and share analytical data as a product, in addition to building applications and maintaining digital products, raises legitimate concerns for both the practitioners and their leaders. The concerns that I often hear from leaders, at this point in the conversation, include: “How am I going to manage the cost of operating the domain data products, if every domain needs to build and own its own data?” “How do I hire the data engineers, who are already hard to find, to staff in every domain?” “This seems like a lot of overengineering and duplicate effort in each team.” “What technology do I buy to provide all the data product usability characteristics?” “How do I enforce governance in a distributed fashion to avoid chaos?” “What about copied data—how do I manage that?” And so on. Similarly, domain engineering teams and practitioners voice concerns such as, “How can we extend the responsibility of our team to not only build applications to run the business, but also share data?”

Addressing these questions is the reason that data mesh’s third principle, self-serve data infrastructure as a platform, exists. It’s not that there is any shortage of data and analytics platforms, but we need to make changes to them so they can scale out sharing, accessing, and using analytical data, in a decentralized manner, for a new population of generalist technologists. This is the key differentiation of a data mesh platform.

Figure 4-1 depicts the extraction of domain-agnostic capabilities out of each domain and moving to a self-serve infrastructure as a platform. The platform is built and maintained by a dedicated platform team.

Figure 4-1. Extracting and harvesting domain agnostic infrastructure into a separate data platform

In this chapter, I apply platform thinking to the underlying infrastructure capabilities to clarify what we mean by the term platform in the context of data mesh. Then, I share the unique characteristics of data mesh’s underlying platform. Later chapters, such as Chapters 9 and 10, will go into further detail about the platform’s capabilities and how to approach its design. For now, let’s discuss how data mesh’s underlying platform is different from many solutions we have today.

Note

In this chapter, I use the phrase data mesh platform as shorthand for a set of underlying data infrastructure capabilities. A singular form of the term platform does not mean a single solution or a single vendor with tightly integrated features. It’s merely a placeholder for a set of technologies that one can use to achieve the objectives mentioned in “Data Mesh Platform Thinking”, a set of technologies that are independent and yet play nicely together.

Data Mesh Platform: Compare and Contrast

There is a large body of technology solutions that fall into the category of data infrastructure and are often posed as a platform. Here is a small sample of the existing platform capabilities:

  • Analytical data storage in the form of a lake, warehouse, or lakehouse

  • Data processing frameworks and computation engines to process data in batch and streaming modes

  • Data querying languages, based on two modes of computational data flow programming or algebraic SQL-like statements

  • Data catalog solutions to enable data governance as well as discovery of all data across an organization

  • Pipeline workflow management, orchestrating complex data pipeline tasks or ML model deployment workflows

Many of these capabilities are still needed to enable a data mesh implementation. However, there is a shift in approach and the objectives of a data mesh platform. Let’s do a quick compare and contrast.

Figure 4-2 shows a set of unique characteristics of a data mesh platform in comparison to the existing ones. Note that the data mesh platform can utilize existing technologies yet offer these unique characteristics.

The following section clarifies how data mesh works toward building self-serve platforms further.

Figure 4-2. Data mesh platform’s differentiating characteristics

Serving Autonomous Domain-Oriented Teams

The main responsibility of the data mesh platform is to enable existing or new domain engineering teams with the new and embedded responsibilities of building, sharing, and using data products end to end; capturing data from operational systems and other sources; and transforming and sharing the data as a product with the end data users.

The platform must allow teams to do this in an autonomous way without any dependence on centralized data teams or intermediaries.

Many existing vendor technologies are built with an assumption of a centralized data team, capturing and sharing data for all domains. The assumptions around this centralized control have deep technical consequences such as:

  • Cost is estimated and managed monolithically and not per isolated domain resources.

  • Security and privacy management assumes physical resources are shared under the same account and don’t scale to an isolated security context per data product.

  • A central pipeline (DAG) orchestration assumes management of all data pipelines centrally—with a central pipeline configuration repository and a central monitoring portal. This is in conflict with independent pipelines, each small and allocated to a data product implementation.

These are a few examples to demonstrate how existing technologies get in the way of domain teams acting autonomously.

Managing Autonomous and Interoperable Data Products

Data mesh puts a new construct, a domain-oriented data product, at the center of its approach. This is a new architectural construct that autonomously delivers value. It encodes all the behavior and data needed to provide discoverable, usable, trustworthy, and secure data to its end data users. Data products share data with each other and are interconnected in a mesh. Data mesh platforms must work with this new construct and must support managing its autonomous life cycle and all its constituents.

This platform characteristic is different from the existing platforms today that manage behavior, e.g., data processing pipelines, data and its metadata, and policy that governs the data as independent pieces. However, it is possible to create the new data product management abstraction on top of existing technologies, but it is not very elegant.

A Continuous Platform of Operational and Analytical Capabilities

The principle of domain ownership demands a platform that enables autonomous domain teams to manage data end to end. This closes the gap organizationally between the operational plane and the analytical plane. Hence, a data mesh platform must be able to provide a more connected experience. Whether the team is building and running an application or sharing and using data products for analytical use cases, the team’s experience should be connected and seamless. For the platform to be successfully adopted with existing domain technology teams, it must remove the barriers to adoption, the schism between the operational and analytical stacks.

The data mesh platform must close the gap between analytical and operational technologies. It must find ways to get them to work seamlessly together, in a way that is natural to a cross-functional domain-oriented data and application team.

For example, today the computation fabric running data processing pipelines such as Spark are managed on a different clustering architecture, away and often disconnected from the computation fabric that runs operational services, such as Kubernetes. In order to create data products that collaborate closely with their corresponding microservice, i.e., source-aligned data products, we need a closer integration of the computation fabrics. I have worked with very few organizations that have been running both computation engines on the same computation fabric.

Due to inherent differences between the two planes, there are many areas of the platform where the technology for operational and analytical systems must remain different. For example, consider the case of tracing for debugging and audit purposes. The operational systems use the OpenTelemetry standards for tracing of (API) calls across distributed applications, in a tree-like structure. On the other hand, data processing workloads use OpenLineage to trace the lineage of data across distributed data pipelines. There are enough differences between the two planes to mind their gap. However, it is important that these two standards integrate nicely. After all, in many cases, the journey of a piece of data starts from an application call in response to a user action.

Designed for a Generalist Majority

Another barrier to the adoption of data platforms today is the level of proprietary specialization that each technology vendor assumes—the jargon and the vendor-specific knowledge. This has led to the creation of scarce specialized roles such as data engineers.

In my opinion there are a few reasons for this unscalable specialization: lack of (de facto) standards and conventions, lack of incentives for technology interoperability, and lack of incentive to make products super simple to use. I believe this is the residue of the big monolithic platform mentality that a single vendor can provide soup to nuts functionality to store your data on their platform and attach their additional services to keep the data there and its processing under their control.

A data mesh platform must break this pattern and start with the definition of a set of open conventions that promote interoperability between different technologies and reduce the number of proprietary languages and experiences one specialist must learn to generate value from data. Incentivizing and enabling generalist developers with experiences, languages, and APIs that are easy to learn is a starting point to lower the cognitive load of generalist developers. To scale out data-driven development to the larger population of practitioners, data mesh platforms must stay relevant to generalist technologists. They must move to the background, fit naturally into the native tools and programming languages generalists use, and get out of their way.

Needless to say, this should be achieved without compromising on the software engineering practices that result in sustainable solutions. For example, many low-code or no-code platforms promise to work with data, but compromise on testing, versioning, modularity, and other techniques. Over time they become unmaintainable.

Note

The phrase generalist technologist (experts) refers to the population of technologists who are often referred to as T-shaped or Paint Drip people. These are developers experienced in a broad spectrum of software engineering who, at different points in time, focus and gain deep knowledge in one or two areas.

The point is that it is possible to go deep in one or two areas while exploring many others.

They contrast with specialists, who only have expertise in one specific area; their focus on specialization doesn’t allow them to explore a diverse spectrum.

In my mind, future generalists will be able to work with data and create and share data through data products, or use them for feature engineering and ML training when the model has already been developed by specialist data scientists. Essentially, they use AI as a service.

As of now, the majority of data work requires specialization and requires a large amount of effort to gain expertise over a long period of time. This inhibits the entry of generalist technologists and has led to a scarcity of data specialists.

Favoring Decentralized Technologies

Another common characteristic of existing platforms is the centralization of control. Examples include centralized pipeline orchestration tools, centralized catalogs, centralized warehouse schema, centralized allocation of compute/storage resources, and so on. The reason for data mesh’s focus on decentralization through domain ownership is to avoid organizational synchronization and bottlenecks that ultimately slow down the speed of change. Though on the surface this is an organizational concern, the underlying technology and architecture directly influence organizational communication and design. A monolithic or centralized technology solution leads to centralized points of control and teams.

Data mesh platforms need to consider the decentralization of organizations in data sharing, control, and governance at the heart of their design. They inspect every centralized aspect of the design that can result in lockstep team synchronization, centralization of control, and tight coupling between autonomous teams.

Having said that, there are many aspects of infrastructure that need to be centrally managed to reduce the unnecessary tasks that each domain team performs in sharing and using data, e.g., setting up data processing compute clusters. This is where an effective self-serve platform shines, centrally managing underlying resources while allowing independent teams to achieve their outcomes end to end, without tight dependencies to other teams.

Domain Agnostic

Data mesh creates a clear delineation of responsibility between domain teams—who focus on creating business-oriented products, services that are ideally data-driven, and data products—and the platform teams who focus on technical enablers for the domains. This is different from the existing delineation of responsibility where the data team is often responsible for amalgamation of domain-specific data for analytical usage, as well as the underlying technical infrastructure.

This delineation of responsibility needs to be reflected in the platform capabilities. The platform must strike a balance between providing domain-agnostic capabilities, while enabling domain-specific data modeling, processing, and sharing across the organization. This demands a deep understanding of the data developers and the application of product thinking to the platform.

Data Mesh Platform Thinking

Platform: raised level surface on which people or things can stand.

Oxford Languages

The word platform is one of the most commonly used phrases in our everyday technical jargon and is sprinkled all over organizations’ technical strategies. It’s commonly used, yet hard to define and subject to interpretation.

To ground our understanding of the platform in the context of data mesh I draw from the work of a few of my trustworthy sources:

A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination.

Evan Bottcher, “What I Talk About When I Talk About Platforms”

The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services.

The platform simplifies otherwise complex technology and reduces cognitive load for teams that use it.

Matthew Skelton and Manuel Pais, Team Topologies

Platforms are designed one interaction at a time. Thus, the design of every platform should start with the design of the core interaction that it enables between producers and consumers. The core interaction is the single most important form of activity that takes place on a platform—the exchange of value that attracts most users to the platform in the first place.1

Geoffrey G. Parker et al., Platform Revolution)

A platform has a few key objectives that I like to take away and apply to data mesh:

Enable autonomous teams to get value from data
A common characteristic that we see is the ability to enable teams who use the platform to complete their work and achieve their outcomes with a sense of autonomy and without requiring another team to get engaged directly in their workflow, e.g., through backlog dependencies. In the context of data mesh, enabling domain teams with new responsibilities of sharing analytical data, or using analytical data for building ML-based products, in an autonomous way, is a key objective of a data mesh platform. The ability to use the platform capabilities through self-serve APIs is critical to enable autonomy.
Exchange value with autonomous and interoperable data products
Another key aspect of a platform is to intentionally design what value is being exchanged and how. In the case of data mesh, data products are the unit of value exchange, between data users and data providers. A data mesh platform must build in the frictionless exchange of data products as a unit of value in its design.
Accelerate exchange of value by lowering the cognitive load
In order to simplify and accelerate the work of domain teams in delivering value, platforms must hide technical and foundational complexity. This lowers the cognitive load of the domain teams to focus on what matters; in the case of data mesh, this is creating and sharing data products.
Scale out data sharing
Data mesh is a solution offered to solve the problem of organizational scale in getting value from their data. Hence, the design of the platform must cater for scale: sharing data across main domains within the organization, as well as across boundaries of trust outside of the organization in the wider network of partners. One of the blockers to this scale is the lack of interoperability of data sharing, securely, across multiple platforms. A data mesh platform must design for interoperability with other platforms to share data products.
Support a culture of embedded innovation
A data mesh platform supports a culture of innovation by removing activities that are not directly contributing to the innovation, by making it really easy to find data, capture insights, and use data for ML model development.

Figure 4-3 depicts these objectives applied to an ecosystem of domain teams sharing and using data products.

Figure 4-3. Objectives of the data mesh platform

Now, let’s talk about how a data mesh platform achieves these objectives.

Enable Autonomous Teams to Get Value from Data

In designing the platform, it is helpful to consider the roles of platform users and their journey in sharing and using data products. The platform can then focus on how to create a frictionless experience for each journey. For example, let’s consider two of the main personas of the data mesh ecosystem: data product developers and data product users. Of course, each of those personas includes a spectrum of people with different skill sets, but for this conversation we can focus on the aspects of their journeys that are common across this spectrum. There are other roles such as data product owner whose journey is as important in achieving the outcome of creating successful data products; favoring brevity, I leave them out of this example.

Enable data product developers

The delivery journey of a data product developer involves developing a data product; testing, deploying, monitoring, and updating it; and maintaining its integrity and security—with continuous delivery in mind. In short, the developer is managing the life cycle of a data product, working with its code, data, and policies as one unit. As you can imagine, there is a fair bit of infrastructure that needs to be provisioned to manage this life cycle.

Provisioning and managing the underlying infrastructure for life cycle management of a data product requires specialized knowledge of today’s tooling and is difficult to replicate in each domain. Hence, the data mesh platform must implement all necessary capabilities allowing a data product developer to build, test, deploy, secure, and maintain a data product without worrying about the underlying infrastructure resource provisioning. It must enable all domain-agnostic and cross-functional capabilities.

Ultimately, the platform must enable the data product developer to just focus on the domain-specific aspects of data product development:

  • Transformation code, the domain-specific logic that generates and maintains the data

  • Build-time tests to verify and maintain the domain’s data integrity

  • Runtime tests to continuously monitor that the data product meets its quality guarantees

  • Developing a data product’s metadata such as its schema, documentation, etc.

  • Declaration of the required infrastructure resources

The rest must be taken care of by the data mesh platform, for example, infrastructure provisioning—storage, accounts, compute, etc. A self-serve approach exposes a set of platform APIs for the data product developer to declare their infrastructure needs and let the platform take care of the rest. This is discussed in detail in Chapter 14.

Enable data product users

Data users’ journey—whether analyzing data to create insights or developing machine learning models—starts with discovering the data. Once the data is discovered, they need to get access to it, and then understand it and deep dive to explore it further. If the data has proven to be suitable, they will continue to use it. Using the data is not limited to one-time access; the consumers continue receiving and processing new data to keep their machine learning models or insights up to date. The data mesh platform builds the underlying mechanisms that facilitate such a journey and provides the capabilities needed for data product consumers to get their job done without friction.

For the platform to enable this journey, autonomously, it must reduce the need for manual intervention. For example, it must remove the need to chase the team that created the data or the governance team to justify and get access to the data. The platform automates the process that facilitates requests for access and grants access based on automated evaluation of the consumer.

Exchange Value with Autonomous and Interoperable Data Products

An interesting lens on the data mesh platform is to view it as a multisided platform—one that creates value primarily by enabling direct interactions between two (or more) distinct parties. In the case of data mesh, those parties are data product developers, data product owners, and data product users.

This particular lens can be a source of unbounded creativity for building a platform whose success is measured directly by exchanging value, i.e., data products. The value can be exchanged on the mesh, between data products, or at the edge of the mesh, between the end products, such as an ML model, a report, or a dashboard, and the data products. The mesh essentially becomes the organizational data marketplace. This particular data mesh platform characteristic can be a catalyst for a culture change within the organization, promoting sharing to the next level.

As discussed in the previous section, an important aspect of exchanging value is to be able to do that autonomously, without the platform getting in the way. For data product developers, this means being able to create and serve their data products without the constant need for hand-holding or dependency on the platform team.

Create higher-order value by composing data products

The exchange of value goes beyond using a single data product and often extends to the composition of multiple data products. For example, the interesting insights about Daff’s listeners are generated by cross-correlating their behavior while listening to music, the artists they follow, their demographic, their interactions with social media, the influence of their friends network, and the cultural events that surround them. These are multiple data products and need to be correlated and composed into a matrix of features.

The platform makes data product compatibility possible. For example, platforms enable data product linking—when one data product uses data and data types (schema) from another data product. For this to be seamlessly possible, the platform provides a standardized and simple way of identifying data products, addressing data products, connecting to data products, reading data from data products, etc. Such simple platform functions create a mesh of heterogeneous domains with homogeneous interfaces. I will cover this in Chapter 13.

Accelerate Exchange of Value by Lowering the Cognitive Load

Cognitive load was first introduced in the field of cognitive science as the amount of working memory needed to hold temporary information to solve a problem or learn.2 There are multiple factors influencing the cognitive load, such as the intrinsic complexity of the topic at hand or how the task or information is presented.

Platforms are increasingly considered a way of reducing the cognitive load of developers to get their job done. They do this by hiding the amount of detail and information presented to the developer: abstracting complexity.

As a data product developer, I should be able to express what my domain-agnostic wishes are without describing exactly how to implement them. For example, as a developer I should be able to declare the structure of my data, its retention period, its potential size, and its confidentiality class and leave it to the platform to create the data structures, provision the storage, perform automatic encryption, manage encryption keys, automatically rotate keys, etc. This is domain-agnostic complexity that as a data developer or user I should not be exposed to.

There are many techniques for abstracting complexity without sacrificing configurability. The following two methods are commonly applied.

Abstract complexity through declarative modeling

Over the last few years, operational platforms such as container orchestrators, e.g., Kubernetes, or infrastructure provisioning tools, e.g., Terraform, have established a new model for abstracting complexity through declarative modeling of the target state. This is in contrast with other methods such as using imperative instructions to command how to build the target state. Essentially, the former focuses on the what, and the latter focuses on the how. This approach has been widely successful in making the life of a developer much simpler.

In many scenarios declarative modeling hits limitations very quickly. For example, defining the data transformation logic through declarations reaches a diminishing return as soon as the logic gets complex.

However, systems that can be described through their state, such as provisioned infrastructure, lend themselves well to a declarative style. This is also true about the data mesh infrastructure as a platform. The target state of infrastructure to manage the life cycle of a data product can be defined declaratively.

Abstract complexity through automation

Removing the human intervention and manual steps from the data product developer journey through automation is another way to reduce complexity, particularly complexity arising from manual errors through the process. Opportunities to automate aspects of a data mesh implementation are ubiquitous. The provisioning of the underlying data infrastructure itself can be automated using infrastructure as code3 techniques. Additionally, many actions in the data value stream, from production to consumption, can be automated.

For example, today the data certification or verification approval process is often done manually. This is an area of immense opportunity for automation. The platform can automate verifying the integrity of data, apply statistical methods in testing the nature of the data, and even use machine learning to discover unexpected outliers. Such automation removes complexity from the data verification process.

Scale Out Data Sharing

One issue I’ve noticed in the existing big data technology landscape is the lack of standards for interoperable solutions that lead to data sharing at scale, for example, lack of a unified model for authentication and authorization when accessing data, absence of standards for expressing and transmitting privacy rights with data, and lack of standards in presenting temporality aspects of data. These missing standards inhibit scaling the network of usable data beyond the boundaries of organizational trust.

Most importantly, the data technology landscape is missing the Unix philosophy:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together…

Doug McIlroy

I think we got incredibly lucky with very special people (McIlroy, Ritchie, Thompson, and others) seeding the culture, the philosophy, and the way of building software in the operational world. That’s why we have managed to build powerfully scaled and complex systems through loose integration of simple and small services.

For some reason, we have abandoned this philosophy when it comes to big data systems, perhaps because of those early assumptions (see “Characteristics of Analytical Data Architecture”) that seeded the culture. Perhaps, because at some point we decided to separate data (the body) from its code (the soul), which led to establishing a different philosophy around it.

If a data mesh platform wants to realistically scale out sharing data, within and beyond the bounds of an organization, it must wholeheartedly embrace the Unix philosophy and yet adapt it to the unique needs of data management and data sharing. It must design the platform as a set of interoperable services that can be implemented by different vendors with different implementations yet play nicely with the rest of the platform services.

Take observability as an example of a capability that the platform provides—the ability to monitor the behavior of all data products on the mesh and detect any disruptions, errors, and undesirable access, and notify the relevant teams to recover their data products. For observability to work, there are multiple platform services that need to cooperate: the data products emitting and logging information about their operation; the service that captures the emitted logs and metrics and provides a holistic mesh view; the services that search, analyze, and detect anomalies and errors within those logs; and the services that notify the developers when things go wrong. To build this under the Unix philosophy we need to be able to pick and choose these services and connect them together. The key in simple integration of these services is interoperability,4 a common language and APIs by which the logs and metrics are expressed and shared. Without such a standard, we fall back to a single monolithic (but well-integrated) solution that constrains access to data to a single hosting environment. We fail to share and observe data across environments.

Support a Culture of Embedded Innovation

To date, continuous innovation must arguably be one of the core competencies of any business. Eric Ries introduced the Lean Startup5 to demonstrate how to scientifically innovate through short and rapid cycles of build-measure-learn. The concept has since been applied to the larger enterprise through Lean Enterprise6—a scaled innovation methodology.

The point is that to grow a culture of innovation—a culture of rapidly building, testing, and refining ideas—we need an environment that frees its people from unnecessary work and accidental complexity and friction and allow them to experiment. The data mesh platform removes unnecessary manual work, hides complexity, and streamlines the workflows of data product developers and users, to free them to innovate using data. A simple litmus test to assess how effective a data mesh platform is in doing that is to measure how long it takes for a team to dream up a data-driven experiment and get to use the required data to run the experiment. The shorter the time, the more mature the data mesh platform has become.

Another key point is: who is empowered to do the experiments? The data mesh platform supports a domain team to innovate and perform data-driven experiments. The data-driven innovations are no longer exclusive to the central data team. They must be embedded into each domain team in developing their services, products, or processes.

Transition to a Self-Serve Data Mesh Platform

So far, I have talked about the key differences between existing data platforms and data mesh and covered the main objectives of the data mesh platform. Here, I’d like to leave you with a few actions you can take in transitioning to your data mesh platform.

Design the APIs and Protocols First

When you begin your platform journey, whether you are buying, building, or very likely both, start with selecting and designing the interfaces that the platform exposes to its users. The interfaces might be programmatic APIs. There might be command-line or graphic interfaces. Either way, decide on interfaces first and then the implementation of those through various technologies.

This approach is well-adopted by many cloud offerings. For example, cloud blob storage providers expose REST APIs7 to post, get, or delete objects. You can apply this to all capabilities of your platform.

In addition to the APIs, decide on the communication protocols and standards that enable interoperability. Taking inspirations from internet—the one example of a massively distributed architecture—decide on the narrow waist8 protocols. For example, decide on the protocols governing how data products express their semantic, in what format they encode their time-variant data, what query languages each support, what SLOs each guarantee, and so on.

Prepare for Generalist Adoption

I discussed earlier that a data mesh platform must be designed for the generalist majority (“Designed for a Generalist Majority”). Many organizations today are struggling to find data specialists such as data engineers, while there is a large population of generalist developers who are eager to work with data. The fragmented, walled, and highly specialized world of big data technologies have created an equally siloed fragment of hyper-specialized data technologists.

In your evaluation of platform technologies, favor the ones that fit better with a natural style of programming known to many developers. For example, if you are choosing a pipeline orchestration tool, pick the ones that lend themselves to simple programming of Python functions—something familiar to a generalist developer—rather than the ones that try to create yet another domain-specific language (DSL) in YAML or XML with esoteric notations.

In reality, there will be a spectrum of data products in terms of their complexity, and a spectrum of data product developers in terms of their level of specializations. The platform must satisfy this spectrum to mobilize data product delivery at scale. In either case, the need for applying evergreen engineering practices to build resilient and maintainable data products remains necessary.

Do an Inventory and Simplify

The separation of the analytical data plane and the operational plane has left us with two disjointed technology stacks, one dealing with analytical data and the other for building and running applications and services. As data products become integrated and embedded within the operational world, there is an opportunity to converge the two platforms and remove duplicates.

In the last few years the industry has experienced an overinvestment in technologies that are marketed as data solutions. In many cases their operational counterparts are perfectly suitable to do the job. For example, I have seen a new class of continuous integration and continuous delivery (CI/CD) tooling marketed under DataOps. Evaluating these tools more closely, they hardly offer any differentiating capability that the existing CI/CD engines can’t offer.

When you get started, take an inventory of platform services that your organization has adopted and look for opportunities to simplify.

I do hope that the data mesh platform is a catalyst in simplification of the technology landscape and closer collaboration between operational and analytical platforms.

Create Higher-Level APIs to Manage Data Products

The data mesh platform must introduce a new set of APIs to manage data products as a new abstraction (“Managing Autonomous and Interoperable Data Products”). While many data platforms, such as the services you get from your cloud providers, include lower-level utility APIs—storage, catalog, compute—the data mesh platform must introduce a higher level of APIs that deal with a data product as an object.

For example, consider APIs to create a data product, discover a data product, connect to a data product, read from a data product, secure a data product, and so on. See Chapter 9 for the logical blueprint of a data product.

When establishing your data mesh platform, start with high-level APIs that work with the abstraction of a data product.

Build Experiences, Not Mechanisms

I have come across numerous platform building/buying situations, where the articulation of the platform is anchored in mechanisms it includes, as opposed to experiences it enables. This approach in defining the platform often leads to bloated platform development and adoption of overambitious and overpriced technologies.

Take data cataloging as an example. Almost every platform I’ve come across has a data catalog on its list of mechanisms, which leads to the purchase of a data catalog product with the longest list of features, and then overfitting the team’s workflows to fit the catalog’s inner workings. This process often takes months.

In contrast, your platform can start with the articulation of the single experience of discovering data products. Then, build or buy the simplest tools and mechanisms that enable this experience. Then rinse, repeat, and refactor for the next experience.

Begin with the Simplest Foundation, Then Harvest to Evolve

Given the length of this chapter discussing the objectives and unique characteristics of a data mesh platform, you might be wondering, “Can I even begin to adopt data mesh today, or should I wait some time to build the platform first?” The answer is to begin adopting a data mesh strategy today, even if you don’t have a data mesh platform.

You can begin with the simplest possible foundation. Your smallest possible foundation framework is very likely composed of the data technologies that you have already adopted, especially if you are already operating analytics on the cloud. The bottom-layer utilities that you can use as the foundation include the typical storage technologies, data processing frameworks, federated query engines, and so on.

As the number of data products grows, standards are developed, and common ways of approaching similar problems across data products are discovered. Then you will continue to evolve the platform as a harvested framework by collecting common capabilities across data products and domain teams.

Remember that the data mesh platform itself is a product. It’s an internal product—though built from many different tools and services from multiple vendors. The product users are the internal teams. It requires technical product ownership, long-term planning, and long-term maintenance. Though it continues to evolve and goes through evolutionary growth, its life begins today as a minimum viable product (MVP).9

Recap

Data mesh’s principle of a self-serve platform comes to the rescue to lower the cognitive load that the other two principles impose on the existing domain engineering teams: own your analytical data and share it as a product.

It shares common capabilities with the existing data platforms: providing access to polyglot storage, data processing engines, query engines, streaming, etc. However, it differentiates from the existing platforms in its users: autonomous domain teams made up primarily of generalist technologists. It manages a higher-level construct of a data product encapsulating data, metadata, code, and policy as one unit.

Its purpose is to give domain teams superpowers, by hiding low-level complexity behind simpler abstractions and removing friction from their journeys in achieving their outcome of exchanging data products as a unit of value. And ultimately it frees up the teams to innovate with data. To scale out data sharing, beyond a single deployment environment or organizational unit or company, it favors decentralized solutions that are interoperable.

I will continue our deep dive into the platform in Chapter 10 and talk about specific services a data mesh platform could offer.

1 Geoffrey G. Parker, Marshall W. Van Alstyne, and Sangeet Paul Choudary, Platform Revolution, (New York: W.W. Norton & Company, 2016).

2 John Sweller, “Cognitive Load During Problem Solving: Effects on Learning,” Cognitive Science, 12(2) (April 1988): 257–85.

3 Kief Morris, Infrastructure as Code, (Sebastopol, CA: O’Reilly, 2021).

4 OpenLineage is an attempt to standardize tracing logs.

5 Eric Ries, “The Lean Startup”, September 8, 2008.

6 Jez Humble, Joanne Molesky, and Barry O’Reilly, Lean Enterprise, (Sebastopol, CA: O’Reilly, 2015).

7 See the Amazon S3 API Reference as an example.

8 Saamer Akhshabi and Constantine Dovrolis, “The Evolution of Layered Protocol Stacks Leads to an Hourglass-Shaped Architecture”, SIGCOMM conference paper (2011).

9 Ries, “The Lean Startup.”

Get Data Mesh now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.