Skip to Content
Kafka权威指南(第2版)
book

Kafka权威指南(第2版)

by Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
November 2022
Beginner to intermediate
346 pages
11h
Chinese
Posts & Telecom Press
Content preview from Kafka权威指南(第2版)
290
14
这种多阶段处理对写过
map-reduce
代码的人来说应该很熟悉,因为他们经常使用多个
reduce
步骤
。如果你写过
map-reduce
代码,就应该知道,每一个
reduce
步骤都需要单独的
应用程序来处理。与
map-reduc
e
不同,大多数流式处理框架可以将多个步骤放在同一个应
用程序中,框架会负责调配哪一个应用程序实例(或
worker
)执行哪一个步骤。
14.3.4
 使用外部查找
流和表的连接
有时候,流式处理需要将外部数据和流集成在一起,比如根据保存在外部数据库中的规则
来验证事务,或者将用户信息填充到点击事件流中。
要使用外部查找来实现数据填充,可以这样做:对于事件流中的每一个点击事件,从用户
信息表中查找相关的用户信息,生成一个新事件,其中包含原始事件以及用户的年龄和性
别信息,然后将新事件发布到另一个主题上,如图
14-6
所示。
点击事件
主题
填充过的点
击事件主题
填充
用户信息
数据库
14-6:包含外部数据源的流式处理
这种方式最大的问题在于,外部查找会严重增加处理每条记录的延迟,通常为
5~15
毫秒。
这在很多情况下是不可行的。另外,给外部数据存储造成的额外负担也是不可接受的——
流式处理系统每秒可以处理
10~50
万个事件
,而数据库正常情况下每秒只能处理
1
万个事
件。这也增加了可用性方面的复杂性,因为当外部存储不可用时,应用程序需要知道该作
何处理。
为了获得更好的性能和伸缩性,需要在流式处理应用程序中缓存从数据库读取的信息。不
过,管理缓存也是一个大问题。例如,该如何保证缓存中的数据是最新的?如果刷新太过 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

时间序列分析实战:基于机器学习和统计学

时间序列分析实战:基于机器学习和统计学

Aileen Nielsen
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
写给系统管理员的Python脚本编程指南

写给系统管理员的Python脚本编程指南

Posts & Telecom Press, Ganesh Sanjiv Naik
Kubernetes编程

Kubernetes编程

Michael Hausenblas, Stefan Schimanski

Publisher Resources

ISBN: 9787115601421