book

Streaming Data Mesh

by Hubert Dulay, Stephen Mooney

May 2023

Intermediate to advanced

223 pages

5h 58m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Who Should Read This BookWhy We Wrote This BookNavigating This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgmentsHubertStephen
Data DivideData Mesh PillarsData OwnershipData as a ProductFederated Computational Data GovernanceSelf-Service Data PlatformData Mesh DiagramOther Similar Architectural PatternsData FabricData Gateways and Data ServicesData DemocratizationData VirtualizationFocusing on ImplementationApache KafkaAsyncAPI
The Streaming AdvantageStreaming Enables Real-Time Use CasesStreaming Enables Data Optimization AdvantagesReverse ETLThe Kappa ArchitectureLambda Architecture IntroductionKappa Architecture IntroductionSummary
Identifying DomainsDiscernible DomainsGeographic RegionsHybrid ArchitectureMulticloudAvoiding Ambiguous DomainsDomain-Driven DesignDomain ModelDomain LogicBounded ContextThe Ubiquitous LanguageData Mesh Domain RolesData Product EngineerData Product Owner or Data StewardStreaming Data Mesh Tools and Platforms to ConsiderDomain Charge-BacksSummary
Defining Data Product RequirementsIdentifying Data Product DerivativesDerivatives from Other DomainsIngesting Data Product Derivatives with Kafka ConnectConsumabilitySynchronous Data SourcesAsynchronous Data Sources and Change Data CaptureDebezium ConnectorsTransforming Data Derivatives to Data ProductsData StandardizationProtecting Sensitive InformationSQLExtract, Transform, and LoadPublishing Data Products with AsyncAPIRegistering the Streaming Data ProductBuilding an AsyncAPI YAML DocumentAssigning Data TagsVersioningMonitoringSummary
Data Governance in a Streaming Data MeshData Lineage GraphStreaming Data Catalog to Organize Data ProductsMetadataSchemasLineageSecurityScalabilityGenerating the Data Product Page from AsyncAPIApicurio RegistryAccess WorkflowCentralized Versus DecentralizedCentralized EngineersDecentralized (Domain) EngineersSummary
Streaming Data Mesh CLIResource-Related CommandsCluster-Related CommandsTopic-Related CommandsThe domain CommandsThe connect CommandsThe streaming CommandsPublishing a Streaming Data ProductData Governance-Related ServicesSecurity ServicesStandards ServicesLineage ServicesSaaS Services and APIsSummary
InfrastructureTwo Architecture SolutionsDedicated InfrastructureMultitenant InfrastructureStreaming Data Mesh Central ArchitectureThe Domain Agent (aka Sidecar)Data PlaneControl PlaneSummary
The Traditional Data Warehouse StructureIntroducing the Decentralized Team StructureEmpowering PeopleWorking ProcessesFostering CollaborationData-Driven AutomationNew Roles in Data DomainsNew Roles in the Data PlaneNew Roles in Data Science and Business Intelligence
Separating Data Engineering from Data ScienceOnline and Offline Data StoresApache Feast IntroductionSummary

Streaming Data Mesh ExampleDeploying an On-Premises Streaming Data MeshInstalling a ConnectorDeploying Clickstream Connector and Auto-Creating TablesDeploying the Debezium Postgres CDC ConnectorEnrichment of Streaming DataPublishing the Data ProductConsuming Streaming Data ProductsFully Managed SaaS ServicesSummary and Considerations

Content preview from Streaming Data Mesh

Chapter 7. Architecting a Streaming Data Mesh

In Chapters 3 through 6, we covered the pillars of a streaming data mesh. Now we will use that knowledge to architect a streaming data mesh. As we mentioned earlier in this book, the term “mesh” in “data mesh” was taken from the term “service mesh” in microservice architectures. We build upon that similarity to describe the parts of a streaming data mesh by using the same terms used to describe parts of a microservice architecture. We will describe each part of the architecture, so knowledge of microservice architecture is not a prerequisite. We will also consider multiple streaming data mesh solutions and list their benefits and trade-offs. The outcome will be an easy and clear framework that can be used to implement your own streaming data mesh.

Infrastructure

As stated in Chapter 1, we will be implementing a streaming data mesh with Kafka. Using Kafka is optional and can be replaced with Apache Pulsar or Redpanda; whichever you choose, we recommend using a fully managed and serverless streaming platform to relinquish the tasks of self-managing infrastructure. Likewise we will use ksqlDB as the stream processing engine. It is also available as a fully-managed or self-managed service. The following are some options that are fully managed:

DeltaStream
Popsink
Decodable
Materialized
RisingWave
Timeplus

Both Kafka and ksqlDB are stream processing engines that use SQL as the primary way of building streaming data pipelines. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781098130718Errata Page Supplemental Content

Streaming Data Mesh

by Hubert Dulay, Stephen Mooney

Chapter 7. Architecting a Streaming Data Mesh

Infrastructure

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Data Mesh

Data Mesh in Action

Implementing Data Mesh

Building an Event-Driven Data Mesh

Publisher Resources

Chapter 7. Architecting a Streaming Data Mesh

Infrastructure

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Data Mesh

Data Mesh in Action

Implementing Data Mesh

Building an Event-Driven Data Mesh

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.