book

Streaming Systems

by Tyler Akidau, Slava Chernyak, Reuven Lax

July 2018

Beginner to intermediate

349 pages

10h 8m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface Or: What Are You Getting Yourself Into Here?
Navigating This BookTakeawaysConventions Used in This BookOnline ResourcesFiguresCode SnippetsO’Reilly SafariHow to Contact UsAcknowledgments
I. The Beam Model
1. Streaming 101
Terminology: What Is Streaming?On the Greatly Exaggerated Limitations of StreamingEvent Time Versus Processing TimeData Processing PatternsBounded DataUnbounded Data: BatchUnbounded Data: StreamingSummary
2. The What, Where, When, and How of Data Processing
RoadmapBatch Foundations: What and WhereWhat: TransformationsWhere: WindowingGoing Streaming: When and HowWhen: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!When: WatermarksWhen: Early/On-Time/Late Triggers FTW!When: Allowed Lateness (i.e., Garbage Collection)How: AccumulationSummary
3. Watermarks
DefinitionSource Watermark CreationPerfect Watermark CreationHeuristic Watermark CreationWatermark PropagationUnderstanding Watermark PropagationWatermark Propagation and Output TimestampsThe Tricky Case of Overlapping WindowsPercentile WatermarksProcessing-Time WatermarksCase StudiesCase Study: Watermarks in Google Cloud DataflowCase Study: Watermarks in Apache FlinkCase Study: Source Watermarks for Google Cloud Pub/SubSummary
4. Advanced Windowing
When/Where: Processing-Time WindowsEvent-Time WindowingProcessing-Time Windowing via TriggersProcessing-Time Windowing via Ingress TimeWhere: Session WindowsWhere: Custom WindowingVariations on Fixed WindowsVariations on Session WindowsOne Size Does Not Fit AllSummary
5. Exactly-Once and Side Effects
Why Exactly Once MattersAccuracy Versus CompletenessSide EffectsProblem DefinitionEnsuring Exactly Once in ShuffleAddressing DeterminismPerformanceGraph OptimizationBloom FiltersGarbage CollectionExactly Once in SourcesExactly Once in SinksUse CasesExample Source: Cloud Pub/SubExample Sink: FilesExample Sink: Google BigQueryOther SystemsApache Spark StreamingApache FlinkSummary
II. Streams and Tables
6. Streams and Tables
Stream-and-Table Basics Or: a Special Theory of Stream and Table RelativityToward a General Theory of Stream and Table RelativityBatch Processing Versus Streams and TablesA Streams and Tables Analysis of MapReduceReconciling with Batch ProcessingWhat, Where, When, and How in a Streams and Tables WorldWhat: TransformationsWhere: WindowingWhen: TriggersHow: AccumulationA Holistic View of Streams and Tables in the Beam ModelA General Theory of Stream and Table RelativitySummary
7. The Practicalities of Persistent State
MotivationThe Inevitability of FailureCorrectness and EfficiencyImplicit StateRaw GroupingIncremental CombiningGeneralized StateCase Study: Conversion AttributionConversion Attribution with Apache BeamSummary

8. Streaming SQL
What Is Streaming SQL?Relational AlgebraTime-Varying RelationsStreams and TablesLooking Backward: Stream and Table BiasesThe Beam Model: A Stream-Biased ApproachThe SQL Model: A Table-Biased ApproachLooking Forward: Toward Robust Streaming SQLStream and Table SelectionTemporal OperatorsSummary
9. Streaming Joins
All Your Joins Are Belong to StreamingUnwindowed JoinsFULL OUTERLEFT OUTERRIGHT OUTERINNERANTISEMIWindowed JoinsFixed WindowsTemporal ValiditySummary
10. The Evolution of Large-Scale Data Processing
MapReduceHadoopFlumeStormSparkMillWheelKafkaCloud DataflowFlinkBeamSummary
Index
About the Authors

Content preview from Streaming Systems

Preface Or: What Are You Getting Yourself Into Here?

Hello adventurous reader, welcome to our book! At this point, I assume that you’re either interested in learning more about the wonders of stream processing or hoping to spend a few hours reading about the glory of the majestic brown trout. Either way, I salute you! That said, those of you in the latter bucket who don’t also have an advanced understanding of computer science should consider how prepared you are to deal with disappointment before forging ahead; caveat piscator, and all that.

To set the tone for this book from the get go, I wanted to give you a heads up about a couple of things. First, this book is a little strange in that we have multiple authors, but we’re not pretending that we somehow all speak and write in the same voice like we’re weird identical triplets who happened to be born to different sets of parents. Because as interesting as that sounds, the end result would actually be less enjoyable to read. Instead, we’ve opted to each write in our own voices, and we’ve granted the book just enough self-awareness to be able to make reference to each of us where appropriate, but not so much self-awareness that it resents us for making it only into a book and not something cooler like a robot dinosaur with a Scottish accent.¹

As far as voices go, there are three you’ll come across:

Tyler: That would be me. If you haven’t explicitly been told someone else is speaking, you can assume that it’s me, because we added ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491983867Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Streaming Systems

by Tyler Akidau, Slava Chernyak, Reuven Lax

Preface Or: What Are You Getting Yourself Into Here?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.