book

数据工程之道：设计和构建健壮的数据系统

Name: 数据工程之道：设计和构建健壮的数据系统
ISBN: 9787111745273

by Joe Reis, Matt Housley

February 2024

Intermediate to advanced

370 pages

Chinese

China Machine Press

Read now

Unlock full access

Content preview from 数据工程之道：设计和构建健壮的数据系统

查询、建模和转换

293

我们很早就可以通过将流式存储（如

Kafka

）与流处理器（如

Flink

）相结合。创建流式

有向无环图相当于建立一个复杂的

Rube Goldberg

机械，有许多主题和处理任务。

Pulsar

通过将有向无环图作为一个核心的流抽象极大地简化了这个过程。工程师可以将

他们的流式有向无环图定义为单一系统内的代码，而不是在多个系统进行管理。

微批处理与真正的流处理

在微批处理和真正的流处理方法之间一直存在着一些争议。从根本上说，了解你的用

例、性能要求和架构的性能很重要的。

微批处理是一种将面向批处理的框架应用于流的方式。一个微批处理可能以每两分钟到每

秒钟的频率运行。一些微批处理框架（如

Apache Spark Streaming

）就是为这种用例而设计

的，在适当分配资源的情况下，较高的批处理频率性能会很好。（事实上，

DBA

和工程师

长期以来一直在使用更传统的数据库进行微批处理，这往往导致可怕的性能和资源消耗。）

真正的流处理系统（例如

Beam

和

Flink

）一次只处理一个事件。但是这也带来了巨大的

开销。另外需要注意的是，即使在这些真正的流处理系统中，许多进程仍然是分批进行

的。一个将额外数据添加到单个事件的过程可以在低延迟的情况下一次处理一个事件。

然而，一个在窗口上的指标计算可能每隔几秒、每隔几分钟运行一次。

当你使用窗口和触发器（批处理）时，窗口的更新频率是多少？可接受的延迟是多少？

如果你正在收集每隔几分钟发布的黑色星期五的销售指标，只要你设置一个适当的微批

频率，微批就能很好的工作 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9787111745273

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

数据工程之道：设计和构建健壮的数据系统

by Joe Reis, Matt Housley

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.