Skip to Content
Kafka权威指南(第2版)
book

Kafka权威指南(第2版)

by Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
November 2022
Beginner to intermediate
346 pages
11h
Chinese
Posts & Telecom Press
Content preview from Kafka权威指南(第2版)
深入
Kafka
113
6.5.1
 分层存储
2018
年年底开始,
Kafka
社区启动了一个雄心勃勃的项目,为
Kafka
增加分层存储能
力,并计划在
3.0
中发布。
这个项目的动机很简单:之所以使用
Kafka
存储海量数据
,要么是因为吞吐量高,要么是
因为需要长时间保留数据,但以下几点不容忽视。
一个分区可以存储的数据量是有限的。因此,分区数量不仅由产品需求驱动,也受物理
磁盘大小的限制。
磁盘和集群大小的选择取决于存储需求。但是,如果将延迟和吞吐量作为主要考虑因素,
那么集群的规模通常比实际需要的要大,从而增加了成本。
broker
间移动分区
(当扩展或缩小集群时)所需要的时间是由分区大小决定的。大
分区会降低集群的弹性。如今,我们可以充分利用灵活的云部署,所以在进行架构设计
时会偏向于追求更大的弹性。
在分层存储架构中,
Kafka
集群配置了两个存储层
:本地存储层和远程存储层。本地存储
层和当前的
Kafka
储层一样,使用
broker
的本地磁盘存储日志片段,远程存储层则使用
HDFS
S3
等专用存储系统存储日志片段。
Kafka
用户可以单独为每一层配置保留策略
。由于本地存储的成本通常远高于远程存储,
因此本地存储的数据保留时间通常是几小时,甚至更短,而远程存储的保留时间则比较
长,可以是几天,甚至几个月。
本地存储的延迟明显低于远程存储。对延迟敏感的应用程序通常从本地存储的分区尾部读
取数据,因此可以受益于现有的
Kafka
存储机制
,比如可以有效地利用页面缓存。在进行
数据回填或故障恢复时,应用程序需要用到旧数据,所以需要从远程存储读取。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

时间序列分析实战:基于机器学习和统计学

时间序列分析实战:基于机器学习和统计学

Aileen Nielsen
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
写给系统管理员的Python脚本编程指南

写给系统管理员的Python脚本编程指南

Posts & Telecom Press, Ganesh Sanjiv Naik
Kubernetes编程

Kubernetes编程

Michael Hausenblas, Stefan Schimanski

Publisher Resources

ISBN: 9787115601421