book

Mastering Kafka Streams and ksqlDB

Name: Mastering Kafka Streams and ksqlDB
Author: Mitch Seymour
ISBN: 9781492062493

by Mitch Seymour

February 2021

Intermediate to advanced

432 pages

11h 7m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Who Should Read This BookNavigating This BookSource CodeKafka Streams VersionksqlDB VersionConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Kafka
1. A Rapid Introduction to Kafka
Communication ModelHow Are Streams Stored?Topics and PartitionsEventsKafka Cluster and BrokersConsumer GroupsInstalling KafkaHello, KafkaSummary
II. Kafka Streams
2. Getting Started with Kafka Streams
The Kafka EcosystemBefore Kafka StreamsEnter Kafka StreamsFeatures at a GlanceOperational CharacteristicsScalabilityReliabilityMaintainabilityComparison to Other SystemsDeployment ModelProcessing ModelKappa ArchitectureUse CasesProcessor TopologiesSub-TopologiesDepth-First ProcessingBenefits of Dataflow ProgrammingTasks and Stream ThreadsHigh-Level DSL Versus Low-Level Processor APIIntroducing Our Tutorial: Hello, StreamsProject SetupCreating a New ProjectAdding the Kafka Streams DependencyDSLProcessor APIStreams and TablesStream/Table DualityKStream, KTable, GlobalKTableSummary
3. Stateless Processing
Stateless Versus Stateful ProcessingIntroducing Our Tutorial: Processing a Twitter StreamProject SetupAdding a KStream Source ProcessorSerialization/DeserializationBuilding a Custom SerdesDefining Data ClassesImplementing a Custom DeserializerImplementing a Custom SerializerBuilding the Tweet SerdesFiltering DataBranching DataTranslating TweetsMerging StreamsEnriching TweetsAvro Data ClassSentiment AnalysisSerializing Avro DataRegistryless Avro SerdesSchema Registry–Aware Avro SerdesAdding a Sink ProcessorRunning the CodeEmpirical VerificationSummary
4. Stateful Processing
Benefits of Stateful ProcessingPreview of Stateful OperatorsState StoresCommon CharacteristicsPersistent Versus In-Memory StoresIntroducing Our Tutorial: Video Game LeaderboardProject SetupData ModelsAdding the Source ProcessorsKStreamKTableGlobalKTableRegistering Streams and TablesJoinsJoin OperatorsJoin TypesCo-PartitioningValue JoinersKStream to KTable Join (players Join)KStream to GlobalKTable Join (products Join)Grouping RecordsGrouping StreamsGrouping TablesAggregationsAggregating StreamsAggregating TablesPutting It All TogetherInteractive QueriesMaterialized StoresAccessing Read-Only State StoresQuerying Nonwindowed Key-Value StoresLocal QueriesRemote QueriesSummary
5. Windows and Time
Introducing Our Tutorial: Patient Monitoring ApplicationProject SetupData ModelsTime SemanticsTimestamp ExtractorsIncluded Timestamp ExtractorsCustom Timestamp ExtractorsRegistering Streams with a Timestamp ExtractorWindowing StreamsWindow TypesSelecting a WindowWindowed AggregationEmitting Window ResultsGrace PeriodSuppressionFiltering and Rekeying Windowed KTablesWindowed JoinsTime-Driven DataflowAlerts SinkQuerying Windowed Key-Value StoresSummary
6. Advanced State Management
Persistent Store Disk LayoutFault ToleranceChangelog TopicsStandby ReplicasRebalancing: Enemy of the State (Store)Preventing State MigrationSticky AssignmentStatic MembershipReducing the Impact of RebalancesIncremental Cooperative RebalancingControlling State SizeDeduplicating Writes with Record CachesState Store MonitoringAdding State ListenersAdding State Restore ListenersBuilt-in MetricsInteractive QueriesCustom State StoresSummary

7. Processor API
When to Use the Processor APIIntroducing Our Tutorial: IoT Digital Twin ServiceProject SetupData ModelsAdding Source ProcessorsAdding Stateless Stream ProcessorsCreating Stateless ProcessorsCreating Stateful ProcessorsPeriodic Functions with PunctuateAccessing Record MetadataAdding Sink ProcessorsInteractive QueriesPutting It All TogetherCombining the Processor API with the DSLProcessors and TransformersPutting It All Together: RefactorSummary
III. ksqlDB
8. Getting Started with ksqlDB
What Is ksqlDB?When to Use ksqlDBEvolution of a New Kind of DatabaseKafka Streams IntegrationConnect IntegrationHow Does ksqlDB Compare to a Traditional SQL Database?SimilaritiesDifferencesArchitectureksqlDB ServerksqlDB ClientsDeployment ModesInteractive ModeHeadless ModeTutorialInstalling ksqlDBRunning a ksqlDB ServerPrecreating TopicsUsing the ksqlDB CLISummary
9. Data Integration with ksqlDB
Kafka Connect OverviewExternal Versus Embedded ConnectExternal ModeEmbedded ModeConfiguring Connect WorkersConverters and Serialization FormatsTutorialInstalling ConnectorsCreating Connectors with ksqlDBShowing ConnectorsDescribing ConnectorsDropping ConnectorsVerifying the Source ConnectorInteracting with the Kafka Connect Cluster DirectlyIntrospecting Managed SchemasSummary
10. Stream Processing Basics with ksqlDB
Tutorial: Monitoring Changes at NetflixProject SetupSource TopicsData TypesCustom TypesCollectionsCreating Source CollectionsWith ClauseWorking with Streams and TablesShowing Streams and TablesDescribing Streams and TablesAltering Streams and TablesDropping Streams and TablesBasic QueriesInsert ValuesSimple Selects (Transient Push Queries)ProjectionFilteringFlattening/Unnesting Complex StructuresConditional ExpressionsCoalesceIFNULLCase StatementsWriting Results Back to Kafka (Persistent Queries)Creating Derived CollectionsPutting It All TogetherSummary
11. Intermediate and Advanced Stream Processing with ksqlDB
Project SetupBootstrapping an Environment from a SQL FileData EnrichmentJoinsWindowed JoinsAggregationsAggregation BasicsWindowed AggregationsMaterialized ViewsClientsPull QueriesCurlPush QueriesPush Queries via CurlFunctions and OperatorsOperatorsShowing FunctionsDescribing FunctionsCreating Custom FunctionsAdditional Resources for Custom ksqlDB FunctionsSummary
IV. The Road to Production
12. Testing, Monitoring, and Deployment
TestingTesting ksqlDB QueriesTesting Kafka StreamsBehavioral TestsBenchmarkingKafka Cluster BenchmarkingFinal Thoughts on TestingMonitoringMonitoring ChecklistExtracting JMX MetricsDeploymentksqlDB ContainersKafka Streams ContainersContainer OrchestrationOperationsResetting a Kafka Streams ApplicationRate-Limiting the Output of Your ApplicationUpgrading Kafka StreamsUpgrading ksqlDBSummary
A. Kafka Streams Configuration
Configuration ManagementConfiguration PropertiesConsumer-Specific Configurations
B. ksqlDB Configuration
Query ConfigurationsServer ConfigurationsSecurity Configurations
Index

Overview

Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide shows data engineers how to use these tools to build highly scalable stream processing applications for moving, enriching, and transforming large amounts of data in real time.

Mitch Seymour, data services engineer at Mailchimp, explains important stream processing concepts against a backdrop of several interesting business problems. You'll learn the strengths of both Kafka Streams and ksqlDB to help you choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing.

Learn the basics of Kafka and the pub/sub communication pattern
Build stateless and stateful stream processing applications using Kafka Streams and ksqlDB
Perform advanced stateful operations, including windowed joins and aggregations
Understand how stateful processing works under the hood
Learn about ksqlDB's data integration features, powered by Kafka Connect
Work with different types of collections in ksqlDB and perform push and pull queries
Deploy your Kafka Streams and ksqlDB applications to production

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492062486Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills