book

Kafka: The Definitive Guide

by Neha Narkhede, Gwen Shapira, Todd Palino

September 2017

Beginner to intermediate

319 pages

9h 10m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Who Should Read This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Meet Kafka
Publish/Subscribe MessagingHow It StartsIndividual Queue SystemsEnter KafkaMessages and BatchesSchemasTopics and PartitionsProducers and ConsumersBrokers and ClustersMultiple ClustersWhy Kafka?Multiple ProducersMultiple ConsumersDisk-Based RetentionScalableHigh PerformanceThe Data EcosystemUse CasesKafka’s OriginLinkedIn’s ProblemThe Birth of KafkaOpen SourceThe NameGetting Started with Kafka
2. Installing Kafka
First Things FirstChoosing an Operating SystemInstalling JavaInstalling ZookeeperInstalling a Kafka BrokerBroker ConfigurationGeneral BrokerTopic DefaultsHardware SelectionDisk ThroughputDisk CapacityMemoryNetworkingCPUKafka in the CloudKafka ClustersHow Many Brokers?Broker ConfigurationOS TuningProduction ConcernsGarbage Collector OptionsDatacenter LayoutColocating Applications on ZookeeperSummary
3. Kafka Producers: Writing Messages to Kafka
Producer OverviewConstructing a Kafka ProducerSending a Message to KafkaSending a Message SynchronouslySending a Message AsynchronouslyConfiguring ProducersSerializersCustom SerializersSerializing Using Apache AvroUsing Avro Records with KafkaPartitionsOld Producer APIsSummary
4. Kafka Consumers: Reading Data from Kafka
Kafka Consumer ConceptsConsumers and Consumer GroupsConsumer Groups and Partition RebalanceCreating a Kafka ConsumerSubscribing to TopicsThe Poll LoopConfiguring ConsumersCommits and OffsetsAutomatic CommitCommit Current OffsetAsynchronous CommitCombining Synchronous and Asynchronous CommitsCommit Specified OffsetRebalance ListenersConsuming Records with Specific OffsetsBut How Do We Exit?DeserializersStandalone Consumer: Why and How to Use a Consumer Without a GroupOlder Consumer APIsSummary
5. Kafka Internals
Cluster MembershipThe ControllerReplicationRequest ProcessingProduce RequestsFetch RequestsOther RequestsPhysical StoragePartition AllocationFile ManagementFile FormatIndexesCompactionHow Compaction WorksDeleted EventsWhen Are Topics Compacted?Summary
6. Reliable Data Delivery
Reliability GuaranteesReplicationBroker ConfigurationReplication FactorUnclean Leader ElectionMinimum In-Sync ReplicasUsing Producers in a Reliable SystemSend AcknowledgmentsConfiguring Producer RetriesAdditional Error HandlingUsing Consumers in a Reliable SystemImportant Consumer Configuration Properties for Reliable ProcessingExplicitly Committing Offsets in ConsumersValidating System ReliabilityValidating ConfigurationValidating ApplicationsMonitoring Reliability in ProductionSummary
7. Building Data Pipelines
Considerations When Building Data PipelinesTimelinessReliabilityHigh and Varying ThroughputData FormatsTransformationsSecurityFailure HandlingCoupling and AgilityWhen to Use Kafka Connect Versus Producer and ConsumerKafka ConnectRunning ConnectConnector Example: File Source and File SinkConnector Example: MySQL to ElasticsearchA Deeper Look at ConnectAlternatives to Kafka ConnectIngest Frameworks for Other DatastoresGUI-Based ETL ToolsStream-Processing FrameworksSummary
8. Cross-Cluster Data Mirroring
Use Cases of Cross-Cluster MirroringMulticluster ArchitecturesSome Realities of Cross-Datacenter CommunicationHub-and-Spokes ArchitectureActive-Active ArchitectureActive-Standby ArchitectureStretch ClustersApache Kafka’s MirrorMakerHow to ConfigureDeploying MirrorMaker in ProductionTuning MirrorMakerOther Cross-Cluster Mirroring SolutionsUber uReplicatorConfluent ReplicatorSummary

9. Administering Kafka
Topic OperationsCreating a New TopicAdding PartitionsDeleting a TopicListing All Topics in a ClusterDescribing Topic DetailsConsumer GroupsList and Describe GroupsDelete GroupOffset ManagementDynamic Configuration ChangesOverriding Topic Configuration DefaultsOverriding Client Configuration DefaultsDescribing Configuration OverridesRemoving Configuration OverridesPartition ManagementPreferred Replica ElectionChanging a Partition’s ReplicasChanging Replication FactorDumping Log SegmentsReplica VerificationConsuming and ProducingConsole ConsumerConsole ProducerClient ACLsUnsafe OperationsMoving the Cluster ControllerKilling a Partition MoveRemoving Topics to Be DeletedDeleting Topics ManuallySummary
10. Monitoring Kafka
Metric BasicsWhere Are the Metrics?Internal or External MeasurementsApplication Health ChecksMetric CoverageKafka Broker MetricsUnder-Replicated PartitionsBroker MetricsTopic and Partition MetricsJVM MonitoringOS MonitoringLoggingClient MonitoringProducer MetricsConsumer MetricsQuotasLag MonitoringEnd-to-End MonitoringSummary
11. Stream Processing
What Is Stream Processing?Stream-Processing ConceptsTimeStateStream-Table DualityTime WindowsStream-Processing Design PatternsSingle-Event ProcessingProcessing with Local StateMultiphase Processing/RepartitioningProcessing with External Lookup: Stream-Table JoinStreaming JoinOut-of-Sequence EventsReprocessingKafka Streams by ExampleWord CountStock Market StatisticsClick Stream EnrichmentKafka Streams: Architecture OverviewBuilding a TopologyScaling the TopologySurviving FailuresStream Processing Use CasesHow to Choose a Stream-Processing FrameworkSummary
A. Installing Kafka on Other Operating Systems
Installing on WindowsUsing Windows Subsystem for LinuxUsing Native JavaInstalling on MacOSUsing HomebrewInstalling Manually
Index

Content preview from Kafka: The Definitive Guide

Chapter 10. Monitoring Kafka

The Apache Kafka applications have numerous measurements for their operation—so many, in fact, that it can easily become confusing as to what is important to watch and what can be set aside. These range from simple metrics about the overall rate of traffic, to detailed timing metrics for every request type, to per-topic and per-partition metrics. They provide a detailed view into every operation in the broker, but they can also make you the bane of whomever is responsible for managing your monitoring system.

This section will detail the most critical metrics to monitor all the time, and how to respond to them. We’ll also describe some of the more important metrics to have on hand when debugging problems. This is not an exhaustive list of the metrics that are available, however, because the list changes frequently, and many will only be informative to a hardcore Kafka developer.

Metric Basics

Before getting into the specific metrics provided by the Kafka broker and clients, let’s discuss the basics of how to monitor Java applications and some best practices around monitoring and alerting. This will provide a basis for understanding how to monitor the applications and why the specific metrics described later in this chapter have been chosen as the most important.

Where Are the Metrics?

All of the metrics exposed by Kafka can be accessed via the Java Management Extensions (JMX) interface. The easiest way to use them in an external monitoring system ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Kafka: The Definitive Guide, 2nd Edition

Publisher Resources

ISBN: 9781491936153Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design