book

机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

Name: 机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）
Author: Aurélien Géron
ISBN: 9787111665977

by Aurélien Géron

October 2020

Intermediate to advanced

693 pages

16h 26m

Chinese

China Machine Press

Read now

Unlock full access

Content preview from 机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

｜

第

章

图 3-1：MNIST 数据集中的数字

同样，我们先将训练集数据混洗，这样能保证交叉验证时所有的折叠都差不多（你肯定

不希望某个折叠丢失一些数字）。此外，有些机器学习算法对训练实例的顺序敏感，如

果连续输入许多相似的实例，可能导致执行性能不佳。给数据集混洗正是为了确保这种

情况不会发生

注 2

。

3.2 训练二元分类器

现在先简化问题，只尝试识别一个数字，比如数字 5。那么这个“数字 5 检测器”就

是一个二元分类器的示例，它只能区分两个类别：5 和非 5。先为此分类任务创建目标

向量：

y_train_5 = (y_train == 5)

# True for all 5s, False for all other digits

y_test_5 = (y_test == 5)

接着挑选一个分类器并开始训练。一个好的初始选择是随机梯度下降（SGD）分类器，

使用 Scikit-Learn 的 SGDClassifier 类即可。这个分类器的优势是能够有效处理非常

大型的数据集。这部分是因为 SGD 独立处理训练实例，一次一个（这也使得 SGD 非常

适合在线学习），稍后我们将会看到。此时先创建一个 SGDClassifier 并在整个训练

集上进行训练：

注 2 ：在某些情况下，例如，如果你正在处理时间序列数据（例如股市价格或天气状况），则混洗可能不

是一个好主意。我们将在下一章中对此进行探讨。

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9787111665977

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

by Aurélien Géron

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

算法技术手册（原书第2 版）

Python深度学习入门：从零构建CNN和RNN

管理Kubernetes

云原生：运用容器、函数计算和数据构建下一代应用

Publisher Resources