Skip to Content
构建可扩展分布式系统:方法与实践
book

构建可扩展分布式系统:方法与实践

by Ian Gorton
May 2024
Intermediate
278 pages
5h 24m
Chinese
China Machine Press
Content preview from 构建可扩展分布式系统:方法与实践
流处理系统
|
251
15.1
流处理简介
自从软件系统问世以来,批处理就在处理新的可用数据方面发挥了重要作用。在批处理
系统中,代表新的和更新后的对象的原始数据会被累积到文件中。一个被称为批处理
数据加载任务的软件组件会定期处理这些新的可用数据,并将其插入应用程序的数据库
中。这通常称为
ETL
(提取、转换、加载)流程。
ETL
的意思是处理包含新数据的批处
理文件,将数据聚合并转换为适合插入存储层的格式。
批处理完成后,数据可供分析和外部用户使用。你可以启动对数据库的查询,从新插入
的数据中产生有用的知识。该方案如图
15-1
所示。
数据源
更新
数据
存储
分析
几分钟到几小时
批批批
15-1:批处理
批处理的一个很好的例子是房地产网站。所有新上市、出租和销售的数据都从各种数据
源累积到一个批次中。该批次定期应用到基础数据库,随后对用户可见。新的信息还可
用于分析,例如每个地区每天有多少新房源,以及前一天的房屋销售情况。
批处理是大型系统的一个可靠有效的重要组成部分。然而,缺点是新数据从到达到可用
于查询和分析存在时间差。一旦积累了一批新数据,时间差就可能是一小时或一天,具
体取决于用例,你必须等到:
ETL
任务已将新数据提取到存储库中。
分析工作已完成。
根据不同的规模,整个过程的运行可能需要几分钟到几小时不等。对于很多不要求数据
绝对实时的用例来说,这不是问题。如果你将房屋投放到市场,你的房源没能在几小
时内出现在你最喜欢的房地产网站上,那并不是世界末日,即便是在第二天才出现。但
是,如果有人窃取了你的信用卡信息,等待长达 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Brick-and-Mortar Retailers Get Right

What Successful Brick-and-Mortar Retailers Get Right

Rob Angell
What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
Three Essentials for Agentic AI Security

Three Essentials for Agentic AI Security

Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, Renato Marinho

Publisher Resources

ISBN: 9787111750697