book

Data Mesh

by Zhamak Dehghani

March 2022

Beginner to intermediate

384 pages

10h 54m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Why I Wrote This Book and Why NowWho Should Read This BookHow to Read This BookConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
Data Mesh in ActionA Culture of Data Curiosity and ExperimentationAn Embedded Partnership with Data and MLThe Invisible Platform and PoliciesLimitless Scale with Autonomous Data ProductsThe Positive Network EffectWhy Transform to Data Mesh?The Way Forward
The OutcomesThe ShiftsThe PrinciplesPrinciple of Domain OwnershipPrinciple of Data as a ProductPrinciple of the Self-Serve Data PlatformPrinciple of Federated Computational GovernanceInterplay of the PrinciplesData Mesh Model at a GlanceThe DataOperational DataAnalytical DataThe Origin
A Brief Background on Domain-Driven DesignApplying DDD’s Strategic Design to DataDomain Data ArchetypesSource-Aligned Domain DataAggregate Domain DataConsumer-Aligned Domain DataTransition to Domain OwnershipPush Data Ownership UpstreamDefine Multiple Connected ModelsEmbrace the Most Relevant Domain Data: Don’t Expect a Single Source of TruthHide the Data Pipelines as Domains’ Internal ImplementationRecap
Applying Product Thinking to DataBaseline Usability Attributes of a Data ProductTransition to Data as a ProductInclude Data Product Ownership in DomainsReframe the Nomenclature to Create ChangeThink of Data as a Product, Not a Mere AssetEstablish a Trust-But-Verify Data CultureJoin Data and Compute as One Logical UnitRecap
Data Mesh Platform: Compare and ContrastServing Autonomous Domain-Oriented TeamsManaging Autonomous and Interoperable Data ProductsA Continuous Platform of Operational and Analytical CapabilitiesDesigned for a Generalist MajorityFavoring Decentralized TechnologiesDomain AgnosticData Mesh Platform ThinkingEnable Autonomous Teams to Get Value from DataExchange Value with Autonomous and Interoperable Data ProductsAccelerate Exchange of Value by Lowering the Cognitive LoadScale Out Data SharingSupport a Culture of Embedded InnovationTransition to a Self-Serve Data Mesh PlatformDesign the APIs and Protocols FirstPrepare for Generalist AdoptionDo an Inventory and SimplifyCreate Higher-Level APIs to Manage Data ProductsBuild Experiences, Not MechanismsBegin with the Simplest Foundation, Then Harvest to EvolveRecap
Apply Systems Thinking to Data Mesh GovernanceMaintain Dynamic Equilibrium Between Domain Autonomy and Global InteroperabilityEmbrace Dynamic Topology as a Default StateUtilize Automation and the Distributed ArchitectureApply Federation to the Governance ModelFederated TeamGuiding ValuesPoliciesIncentivesApply Computation to the Governance ModelStandards as CodePolicies as CodeAutomated TestsAutomated MonitoringTransition to Federated Computational GovernanceDelegate Accountability to DomainsEmbed Policy Execution in Each Data ProductAutomate Enablement and Monitoring over InterventionsModel the GapsMeasure the Network EffectEmbrace Change over ConstancyRecap

Great Expectations of DataThe Great Divide of DataScale: Encounter of a New KindBeyond OrderApproaching the Plateau of ReturnRecap
Respond Gracefully to Change in a Complex BusinessAlign Business, Tech, and Now Analytical DataClose the Gap Between Analytical and Operational DataLocalize Data Changes to Business DomainsReduce Accidental Complexity of Pipelines and Copying DataSustain Agility in the Face of GrowthRemove Centralized and Monolithic BottlenecksReduce Coordination of Data PipelinesReduce Coordination of Data GovernanceEnable AutonomyIncrease the Ratio of Value from Data to InvestmentAbstract Technical Complexity with a Data PlatformEmbed Product Thinking EverywhereGo Beyond the BoundariesRecap
Evolution of Analytical Data ArchitecturesFirst Generation: Data Warehouse ArchitectureSecond Generation: Data Lake ArchitectureThird Generation: Multimodal Cloud ArchitectureCharacteristics of Analytical Data ArchitectureMonolithicCentralized Data OwnershipTechnology OrientedRecap
Domain-Oriented Analytical Data Sharing InterfacesOperational Interface DesignAnalytical Data Interface DesignInterdomain Analytical Data DependenciesData Product as an Architecture QuantumA Data Product’s Structural ComponentsData Product Data Sharing InteractionsData Discovery and Observability APIsThe Multiplane Data PlatformA Platform PlaneData Infrastructure (Utility) PlaneData Product Experience PlaneMesh Experience PlaneExampleEmbedded Computational PoliciesData Product SidecarData Product Computational ContainerControl PortRecap
Design a Platform Driven by User JourneysData Product Developer JourneyIncept, Explore, Bootstrap, and SourceBuild, Test, Deploy, and RunMaintain, Evolve, and RetireData Product Consumer JourneyIncept, Explore, Bootstrap, SourceBuild, Test, Deploy, RunMaintain, Evolve, and RetireRecap
Data Product AffordancesData Product Architecture CharacteristicsDesign Influenced by the Simplicity of Complex Adaptive SystemsEmergent Behavior from Simple Local RulesNo Central OrchestratorRecap
Serve DataThe Needs of Data UsersServe Data Design PropertiesServe Data DesignConsume DataArchetypes of Data SourcesLocality of Data ConsumptionData Consumption DesignTransform DataProgrammatic Versus Nonprogrammatic TransformationDataflow-Based TransformationML as TransformationTime-Variant TransformationTransformation DesignRecap
Discover, Understand, Trust, and ExploreBegin Discovery with Self-RegistrationDiscover the Global URIUnderstand Semantic and Syntax ModelsEstablish Trust with Data GuaranteesExplore the Shape of DataLearn with DocumentationDiscover, Explore, and Understand DesignCompose DataConsume Data Design PropertiesTraditional Approaches to Data ComposabilityCompose Data DesignRecap
Manage the Life CycleManage Life-Cycle DesignData Product Manifest ComponentsGovern DataGovern Data DesignStandardize PoliciesData and Policy IntegrationLinking PoliciesObserve, Debug, and AuditObservability DesignRecap
Should You Adopt Data Mesh Today?Data Mesh as an Element of Data StrategyData Mesh Execution FrameworkBusiness-Driven ExecutionEnd-to-End and Iterative ExecutionEvolutionary ExecutionRecap
ChangeCultureValuesRewardIntrinsic MotivationsExtrinsic MotivationsStructureOrganization Structure AssumptionsDiscover Data Product BoundariesPeopleRolesSkillset DevelopmentProcessKey Process ChangesRecap

Content preview from Data Mesh

Chapter 4. Principle of the Self-Serve Data Platform

Simplicity is about subtracting the obvious and adding the meaningful.

John Maeda

So far I have offered two fundamental shifts toward data mesh: a distributed data architecture and ownership model oriented around business domains, and data shared as a usable and valuable product. Over time, these two seemingly simple and rather intuitive shifts can have undesired consequences: duplication of efforts in each domain, increased cost of operation, and likely large-scale inconsistencies and incompatibilities across domains.

Expecting domain engineering teams to own and share analytical data as a product, in addition to building applications and maintaining digital products, raises legitimate concerns for both the practitioners and their leaders. The concerns that I often hear from leaders, at this point in the conversation, include: “How am I going to manage the cost of operating the domain data products, if every domain needs to build and own its own data?” “How do I hire the data engineers, who are already hard to find, to staff in every domain?” “This seems like a lot of overengineering and duplicate effort in each team.” “What technology do I buy to provide all the data product usability characteristics?” “How do I enforce governance in a distributed fashion to avoid chaos?” “What about copied data—how do I manage that?” And so on. Similarly, domain engineering teams and practitioners voice concerns such as, “How can we extend ...