Skip to Content
数据工程之道:设计和构建健壮的数据系统
book

数据工程之道:设计和构建健壮的数据系统

by Joe Reis, Matt Housley
February 2024
Intermediate to advanced
370 pages
7h
Chinese
China Machine Press
Content preview from 数据工程之道:设计和构建健壮的数据系统
143
5
源系统中的数据生成
欢迎来到数据工程生命周期的第一阶段:源系统中的数据生成。正如我们之前所描述
的,数据工程师的工作是从源系统获取数据,对其进行处理,使其有助于为下游用例提
供服务。但在获取原始数据之前,你必须了解数据存在于何处、如何生成以及其特征和
特性。
本章涵盖一些流行的操作型源系统模式和重要的源系统类型。现在有许多数据生成的源
系统,我们无法详尽列举所有这些系统。我们重点关注数据生成的源系统以及你在使用
源系统时应该考虑的事项。我们还将讨论数据工程的底层设计,以及如何将其应用于数
据工程生命周期的第一阶段(如图
5-1
所示)
数据工程生命周期
生成
分析
数据管理
获取
机器学习
转换
反向 ETL
服务
存储
安全 软件工程编排数据架构
底层设计
DataOps
5-1:源系统为数据工程生命周期的其余部分生成数据
随着数据的激增,尤其是共享数据(接下来讨论)的兴起,我们预计数据工程师的角色
将在很大程度上转向理解数据源和目的地之间的相互作用。数据工程的最基本的数据管
 
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

设计数据密集型应用程序

设计数据密集型应用程序

Martin Kleppmann
Understanding DeFi

Understanding DeFi

Alexandra Damsker
INSPIRED

INSPIRED

Marty Cagan

Publisher Resources

ISBN: 9787111745273