book

Data Engineering Design Patterns

Name: Data Engineering Design Patterns
Author: Bartosz Konieczny
ISBN: 9781098165819

by Bartosz Konieczny

April 2025

Intermediate to advanced

374 pages

10h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
Conventions Used in This BookThe Structure of This BookHow to Use This BookWhat Should I Know Prior to Reading This Book?Glossary and Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introducing Data Engineering Design Patterns
What Are Design Patterns?Yet More Design Patterns?Common Data Engineering PatternsCase Study Used in This BookSummary
2. Data Ingestion Design Patterns
Full LoadPattern: Full LoaderIncremental LoadPattern: Incremental LoaderPattern: Change Data CaptureReplicationPattern: Passthrough ReplicatorPattern: Transformation ReplicatorData CompactionPattern: CompactorData ReadinessPattern: Readiness MarkerEvent DrivenPattern: External TriggerSummary
3. Error Management Design Patterns
Unprocessable RecordsPattern: Dead-LetterDuplicated RecordsPattern: Windowed DeduplicatorLate DataPattern: Late Data DetectorPattern: Static Late Data IntegratorPattern: Dynamic Late Data IntegratorFilteringPattern: Filter InterceptorFault TolerancePattern: CheckpointerSummary
4. Idempotency Design Patterns
OverwritingPattern: Fast Metadata CleanerPattern: Data OverwriteUpdatesPattern: MergerPattern: Stateful MergerDatabasePattern: Keyed IdempotencyPattern: Transactional WriterImmutable DatasetPattern: ProxySummary
5. Data Value Design Patterns
Data EnrichmentPattern: Static JoinerPattern: Dynamic JoinerData DecorationPattern: WrapperPattern: Metadata DecoratorData AggregationPattern: Distributed AggregatorPattern: Local AggregatorSessionizationPattern: Incremental SessionizerPattern: Stateful SessionizerData OrderingPattern: Bin Pack OrdererPattern: FIFO OrdererSummary
6. Data Flow Design Patterns
SequencePattern: Local SequencerPattern: Isolated SequencerFan-InPattern: Aligned Fan-InPattern: Unaligned Fan-InFan-OutPattern: Parallel SplitPattern: Exclusive ChoiceOrchestrationPattern: Single RunnerPattern: Concurrent RunnerSummary
7. Data Security Design Patterns
Data RemovalPattern: Vertical PartitionerPattern: In-Place OverwriterAccess ControlPattern: Fine-Grained Accessor for TablesPattern: Fine-Grained Accessor for ResourcesData ProtectionPattern: EncryptorPattern: AnonymizerPattern: Pseudo-AnonymizerConnectivityPattern: Secrets PointerPattern: Secretless ConnectorSummary
8. Data Storage Design Patterns
PartitioningPattern: Horizontal PartitionerPattern: Vertical PartitionerRecords OrganizationPattern: BucketPattern: SorterRead Performance OptimizationPattern: Metadata EnhancerPattern: Dataset MaterializerPattern: ManifestData RepresentationPattern: NormalizerPattern: DenormalizerSummary
9. Data Quality Design Patterns
Quality EnforcementPattern: Audit-Write-Audit-PublishPattern: Constraints EnforcerSchema ConsistencyPattern: Schema Compatibility EnforcerPattern: Schema MigratorQuality ObservationPattern: Offline ObserverPattern: Online ObserverSummary

10. Data Observability Design Patterns
Data DetectorsPattern: Flow Interruption DetectorPattern: Skew DetectorTime DetectorsPattern: Lag DetectorPattern: SLA Misses DetectorData LineagePattern: Dataset TrackerPattern: Fine-Grained TrackerSummary
Afterword
Appendix. Summary of Patterns
Data Ingestion Design PatternsError Management Design PatternsIdempotency Design PatternsData Value Design PatternsData Flow Design PatternsData Security Design PatternsData Storage Design PatternsData Quality Design PatternsData Observability Design Patterns
Index
About the Author

Content preview from Data Engineering Design Patterns

Chapter 1. Introducing Data Engineering Design Patterns

Design patterns are well established in the software engineering space, but they have only recently begun getting traction in the data engineering world. Consequently, I owe you a few words of introduction and an explanation of what design patterns are in the context of data engineering.

What Are Design Patterns?

You may be surprised at how many times you rely on patterns in your daily life. Let’s take a look at an example involving cooking and one of my favorite desserts, flan; if you like creamy desserts and haven’t tried flan yet, I highly recommend it! When you want to prepare flan, you need to get all the ingredients and follow a list of preparation steps. As an outcome, you get a tasty dessert.

Why am I giving this cooking example as the introduction to a technical book about design patterns? It’s because a recipe is a great representation of what a design pattern should be: a predefined and customizable template for solving a problem. How does this flan example apply to this definition?

The ingredients and the list of preparation steps are the predefined template. They give you instructions but remain customizable, as you might decide to use brown sugar instead of white, for example.
There can be a single use or many uses. The flan can be a dessert you’ll share with family at teatime, or it can be a product that you’ll sell to make a living. This is the contextualization of a design pattern. Design patterns always ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098165826Errata Page

Cloud Computing