Skip to Content
机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)
book

机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)

by Aurélien Géron
October 2020
Intermediate to advanced
693 pages
16h 26m
Chinese
China Machine Press
Content preview from 机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)
362
13
使用 TensorFlow 加载和预处理数据
到目前为止,我们仅使用了适合放入内存的数据集,但是深度学习系统经常在非常大
的数据集上训练,而这些数据集不能完全放入 RAM。读取大型数据集并对其进行有
效预处理可能对其他深度学习库来说很难实现,但是 TensorFlow 借助
Data API
很容
易实现:只需创建一个数据集对象,并告诉它从何处获取数据以及如何对其进行转换。
TensorFlow 负责所有细节的实现,例如多线程、队列、批处理和预取。此外,Data API
tf.keras 无缝协同工作!
现成的 Data API 可以读取文本文件(例如 CSV 文件)、具有固定大小记录的二进制
文件以及使用 TensorFlow TFRecord 格式(支持各种大小的记录)的二进制文件。
TFRecord 是一种包含协议缓冲区的灵活高效的二进制格式(一种开源二进制格式)。
Data API 还支持从 SQL 数据库中读取。而且许多开源扩展都可以从各种数据源中读取,
例如 Google BigQuery 服务。
有效读取大数据集并不是唯一的难点:数据也需要进行预处理,通常是归一化的。而且,
它并不总是严格地由数字字段组成,可能存在文本特征、分类特征等。这些需要进行编
码、例如使用独热编码、词袋编码或嵌入(如我们将要看到的,嵌入是一种可训练的密
集向量,表示类别或令牌)。处理所有这些预处理的一种方法是编写自己的自定义预处
理层,也可以使用 Keras 提供的标准预处理层。
在本章中,我们将介绍 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

算法技术手册(原书第2 版)

算法技术手册(原书第2 版)

George T.Heineman, Gary Pollice, Stanley Selkow
管理Kubernetes

管理Kubernetes

Brendan Burns, Craig Tracey

Publisher Resources

ISBN: 9787111665977