book

机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

Name: 机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）
Author: Aurélien Géron
ISBN: 9787111665977

by Aurélien Géron

October 2020

Intermediate to advanced

693 pages

16h 26m

Chinese

China Machine Press

Read now

Unlock full access

Content preview from 机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

362

第 13 章

使用 TensorFlow 加载和预处理数据

到目前为止，我们仅使用了适合放入内存的数据集，但是深度学习系统经常在非常大

的数据集上训练，而这些数据集不能完全放入 RAM。读取大型数据集并对其进行有

效预处理可能对其他深度学习库来说很难实现，但是 TensorFlow 借助

Data API

很容

易实现：只需创建一个数据集对象，并告诉它从何处获取数据以及如何对其进行转换。

TensorFlow 负责所有细节的实现，例如多线程、队列、批处理和预取。此外，Data API

与 tf.keras 无缝协同工作！

现成的 Data API 可以读取文本文件（例如 CSV 文件）、具有固定大小记录的二进制

文件以及使用 TensorFlow 的 TFRecord 格式（支持各种大小的记录）的二进制文件。

TFRecord 是一种包含协议缓冲区的灵活高效的二进制格式（一种开源二进制格式）。

Data API 还支持从 SQL 数据库中读取。而且许多开源扩展都可以从各种数据源中读取，

例如 Google 的 BigQuery 服务。

有效读取大数据集并不是唯一的难点：数据也需要进行预处理，通常是归一化的。而且，

它并不总是严格地由数字字段组成，可能存在文本特征、分类特征等。这些需要进行编

码、例如使用独热编码、词袋编码或嵌入（如我们将要看到的，嵌入是一种可训练的密

集向量，表示类别或令牌）。处理所有这些预处理的一种方法是编写自己的自定义预处

理层，也可以使用 Keras 提供的标准预处理层。

在本章中，我们将介绍 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9787111665977

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

by Aurélien Géron

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

算法技术手册（原书第2 版）

Python深度学习入门：从零构建CNN和RNN

管理Kubernetes

云原生：运用容器、函数计算和数据构建下一代应用

Publisher Resources