Skip to Content
构建机器学习管道
book

构建机器学习管道

by Hannes Hapke, Catherine Nelson
May 2025
Intermediate to advanced
366 pages
4h 36m
Chinese
O'Reilly Media, Inc.
Content preview from 构建机器学习管道

第 3 章 数据导入 数据导入

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

有了基本的 TFX 设置和 ML MetadataStore,在本章中,我们将重点介绍如何将数据集摄取到管道中,以便在各种组件中使用,如图 3-1 所示。

Data Ingestion as part of ML Pipelines
图 3-1. 作为 ML 管道一部分的数据摄取

TFX 为我们提供了从文件或服务中获取数据的组件。在本章中,我们将概述基本概念,解释将数据集拆分为训练和评估子集的方法,并演示如何将多个数据导出合并为一个全面的数据集。 然后,我们将讨论一些摄取不同形式数据(结构化数据、文本数据和图像数据)的策略,这些策略已在以往的使用案例中得到证明。

数据输入概念

在管道的这一步骤中,我们从外部服务(如谷歌云 BigQuery)读取数据文件或请求管道运行所需的数据。 在将获取的数据集传递给下一个组件之前,我们会将可用数据划分为不同的数据集(例如训练数据集和验证数据集),然后将数据集转换为 TFRecord 文件,其中包含以tf.Example 数据结构表示的数据。

数据集的摄取、分割和转换过程由ExampleGen 组件执行。 正如我们在以下示例中看到的,数据集可以从本地和远程文件夹中读取,也可以从 Google Cloud BigQuery 等数据服务中请求。

输入本地数据文件

ExampleGen 组件可以接收一些数据结构,包括逗号分隔值文件(CSV)、预计算的 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Brick-and-Mortar Retailers Get Right

What Successful Brick-and-Mortar Retailers Get Right

Rob Angell
Search Marketing

Search Marketing

Kelly Cutler
What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer

Publisher Resources

ISBN: 9798341659292