book

Mastering Kafka Streams and ksqlDB

Name: Mastering Kafka Streams and ksqlDB
Author: Mitch Seymour
ISBN: 9781492062493

by Mitch Seymour

February 2021

Intermediate to advanced

432 pages

11h 7m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Who Should Read This BookNavigating This BookSource CodeKafka Streams VersionksqlDB VersionConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Kafka
1. A Rapid Introduction to Kafka
Communication ModelHow Are Streams Stored?Topics and PartitionsEventsKafka Cluster and BrokersConsumer GroupsInstalling KafkaHello, KafkaSummary
II. Kafka Streams
2. Getting Started with Kafka Streams
The Kafka EcosystemBefore Kafka StreamsEnter Kafka StreamsFeatures at a GlanceOperational CharacteristicsScalabilityReliabilityMaintainabilityComparison to Other SystemsDeployment ModelProcessing ModelKappa ArchitectureUse CasesProcessor TopologiesSub-TopologiesDepth-First ProcessingBenefits of Dataflow ProgrammingTasks and Stream ThreadsHigh-Level DSL Versus Low-Level Processor APIIntroducing Our Tutorial: Hello, StreamsProject SetupCreating a New ProjectAdding the Kafka Streams DependencyDSLProcessor APIStreams and TablesStream/Table DualityKStream, KTable, GlobalKTableSummary
3. Stateless Processing
Stateless Versus Stateful ProcessingIntroducing Our Tutorial: Processing a Twitter StreamProject SetupAdding a KStream Source ProcessorSerialization/DeserializationBuilding a Custom SerdesDefining Data ClassesImplementing a Custom DeserializerImplementing a Custom SerializerBuilding the Tweet SerdesFiltering DataBranching DataTranslating TweetsMerging StreamsEnriching TweetsAvro Data ClassSentiment AnalysisSerializing Avro DataRegistryless Avro SerdesSchema Registry–Aware Avro SerdesAdding a Sink ProcessorRunning the CodeEmpirical VerificationSummary
4. Stateful Processing
Benefits of Stateful ProcessingPreview of Stateful OperatorsState StoresCommon CharacteristicsPersistent Versus In-Memory StoresIntroducing Our Tutorial: Video Game LeaderboardProject SetupData ModelsAdding the Source ProcessorsKStreamKTableGlobalKTableRegistering Streams and TablesJoinsJoin OperatorsJoin TypesCo-PartitioningValue JoinersKStream to KTable Join (players Join)KStream to GlobalKTable Join (products Join)Grouping RecordsGrouping StreamsGrouping TablesAggregationsAggregating StreamsAggregating TablesPutting It All TogetherInteractive QueriesMaterialized StoresAccessing Read-Only State StoresQuerying Nonwindowed Key-Value StoresLocal QueriesRemote QueriesSummary
5. Windows and Time
Introducing Our Tutorial: Patient Monitoring ApplicationProject SetupData ModelsTime SemanticsTimestamp ExtractorsIncluded Timestamp ExtractorsCustom Timestamp ExtractorsRegistering Streams with a Timestamp ExtractorWindowing StreamsWindow TypesSelecting a WindowWindowed AggregationEmitting Window ResultsGrace PeriodSuppressionFiltering and Rekeying Windowed KTablesWindowed JoinsTime-Driven DataflowAlerts SinkQuerying Windowed Key-Value StoresSummary
6. Advanced State Management
Persistent Store Disk LayoutFault ToleranceChangelog TopicsStandby ReplicasRebalancing: Enemy of the State (Store)Preventing State MigrationSticky AssignmentStatic MembershipReducing the Impact of RebalancesIncremental Cooperative RebalancingControlling State SizeDeduplicating Writes with Record CachesState Store MonitoringAdding State ListenersAdding State Restore ListenersBuilt-in MetricsInteractive QueriesCustom State StoresSummary

7. Processor API
When to Use the Processor APIIntroducing Our Tutorial: IoT Digital Twin ServiceProject SetupData ModelsAdding Source ProcessorsAdding Stateless Stream ProcessorsCreating Stateless ProcessorsCreating Stateful ProcessorsPeriodic Functions with PunctuateAccessing Record MetadataAdding Sink ProcessorsInteractive QueriesPutting It All TogetherCombining the Processor API with the DSLProcessors and TransformersPutting It All Together: RefactorSummary
III. ksqlDB
8. Getting Started with ksqlDB
What Is ksqlDB?When to Use ksqlDBEvolution of a New Kind of DatabaseKafka Streams IntegrationConnect IntegrationHow Does ksqlDB Compare to a Traditional SQL Database?SimilaritiesDifferencesArchitectureksqlDB ServerksqlDB ClientsDeployment ModesInteractive ModeHeadless ModeTutorialInstalling ksqlDBRunning a ksqlDB ServerPrecreating TopicsUsing the ksqlDB CLISummary
9. Data Integration with ksqlDB
Kafka Connect OverviewExternal Versus Embedded ConnectExternal ModeEmbedded ModeConfiguring Connect WorkersConverters and Serialization FormatsTutorialInstalling ConnectorsCreating Connectors with ksqlDBShowing ConnectorsDescribing ConnectorsDropping ConnectorsVerifying the Source ConnectorInteracting with the Kafka Connect Cluster DirectlyIntrospecting Managed SchemasSummary
10. Stream Processing Basics with ksqlDB
Tutorial: Monitoring Changes at NetflixProject SetupSource TopicsData TypesCustom TypesCollectionsCreating Source CollectionsWith ClauseWorking with Streams and TablesShowing Streams and TablesDescribing Streams and TablesAltering Streams and TablesDropping Streams and TablesBasic QueriesInsert ValuesSimple Selects (Transient Push Queries)ProjectionFilteringFlattening/Unnesting Complex StructuresConditional ExpressionsCoalesceIFNULLCase StatementsWriting Results Back to Kafka (Persistent Queries)Creating Derived CollectionsPutting It All TogetherSummary
11. Intermediate and Advanced Stream Processing with ksqlDB
Project SetupBootstrapping an Environment from a SQL FileData EnrichmentJoinsWindowed JoinsAggregationsAggregation BasicsWindowed AggregationsMaterialized ViewsClientsPull QueriesCurlPush QueriesPush Queries via CurlFunctions and OperatorsOperatorsShowing FunctionsDescribing FunctionsCreating Custom FunctionsAdditional Resources for Custom ksqlDB FunctionsSummary
IV. The Road to Production
12. Testing, Monitoring, and Deployment
TestingTesting ksqlDB QueriesTesting Kafka StreamsBehavioral TestsBenchmarkingKafka Cluster BenchmarkingFinal Thoughts on TestingMonitoringMonitoring ChecklistExtracting JMX MetricsDeploymentksqlDB ContainersKafka Streams ContainersContainer OrchestrationOperationsResetting a Kafka Streams ApplicationRate-Limiting the Output of Your ApplicationUpgrading Kafka StreamsUpgrading ksqlDBSummary
A. Kafka Streams Configuration
Configuration ManagementConfiguration PropertiesConsumer-Specific Configurations
B. ksqlDB Configuration
Query ConfigurationsServer ConfigurationsSecurity Configurations
Index

Content preview from Mastering Kafka Streams and ksqlDB

Preface

For data engineers and data scientists, there’s never a shortage of technologies that are competing for our attention. Whether we’re perusing our favorite subreddits, scanning Hacker News, reading tech blogs, or weaving through hundreds of tables at a tech conference, there are so many things to look at that it can start to feel overwhelming.

But if we can find a quiet corner to just think for a minute, and let all of the buzz fade into the background, we can start to distinguish patterns from the noise. You see, we live in the age of explosive data growth, and many of these technologies were created to help us store and process data at scale. We’re told that these are modern solutions for modern problems, and we sit around discussing “big data” as if the idea is avant-garde, when really the focus on data volume is only half the story.

Technologies that only solve for the data volume problem tend to have batch-oriented techniques for processing data. This involves running a job on some pile of data that has accumulated for a period of time. In some ways, this is like trying to drink the ocean all at once. With modern computing power and paradigms, some technologies actually manage to achieve this, though usually at the expense of high latency.

Instead, there’s another property of modern data that we focus on in this book: data moves over networks in steady and never-ending streams. The technologies we cover in this book, Kafka Streams and ksqlDB, are specifically designed ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492062486Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Mastering Kafka Streams and ksqlDB

by Mitch Seymour

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.