Skip to Content
《使用 Scikit-Learn、Keras 和 TensorFlow 进行实践机器学习》第三版
book

《使用 Scikit-Learn、Keras 和 TensorFlow 进行实践机器学习》第三版

by Aurélien Géron
May 2025
Intermediate to advanced
864 pages
12h 32m
Chinese
O'Reilly Media, Inc.
Content preview from 《使用 Scikit-Learn、Keras 和 TensorFlow 进行实践机器学习》第三版

第 6 章 决策树 决策树

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

决策树是一种通用的机器学习算法,既能执行分类任务,也能执行回归任务,甚至还能执行多输出任务。它们是强大的算法,能够拟合复杂的数据集。例如,在第 2 章中,你在加州住房数据集上训练了一个DecisionTreeRegressor 模型,完全拟合(实际上是过度拟合)。

决策树也是随机森林的基本组成部分(见第 7 章),而随机森林是当今最强大的机器学习算法之一。

在本章中,我们将首先讨论如何使用决策树进行训练、可视化和预测。然后,我们将介绍 Scikit-Learn 使用的 CART 训练算法,并探讨如何对树进行正则化并将其用于回归任务。最后,我们将讨论决策树的一些局限性。

决策树的训练和可视化

为了了解决策树,让我们构建一棵决策树,看看它是如何进行预测的。下面的代码会在虹膜数据集(参见第 4 章)上训练DecisionTreeClassifier

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris(as_frame=True)
X_iris = iris.data[["petal length (cm)", "petal width (cm)"]].values
y_iris = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2, random_state=42)
tree_clf.fit(X_iris, y_iris)

首先,您可以使用export_graphviz() 函数输出名为iris_tree.dot 的图形定义文件,将训练好的决策树可视化:

from sklearn.tree import export_graphviz

export_graphviz(
        tree_clf,
        out_file="iris_tree.dot",
        feature_names=["petal length (cm)", "petal width (cm)"],
        class_names=iris.target_names,
        rounded=True,
        filled=True
    )

然后,您可以使用graphviz.Source.from_file() 在 Jupyter 笔记本中加载和显示文件:

from graphviz import Source

Source.from_file("iris_tree.dot")

Graphviz是一款开源图形可视化软件包。它还包含一个dot 命令行工具,用于将.dot文件转换为 PDF 或 PNG 等多种格式。

第一个决策树如图 6-1 所示

mls3 0601
图 6-1. 虹膜决策树

进行预测

让我们来看看图 6-1中的树是如何进行预测的。假设您发现了一朵鸢尾花,并想根据它的花瓣对其进行分类。您可以从根节点(深度 0,位于顶部)开始:这个节点会询问花瓣的长度是否小于 2.45 厘米。如果是,则向下移动到根节点的左侧子节点(深度 1,左侧)。在这种情况下,它是一个叶节点

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)

机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)

Aurélien Géron
学习 API 风格

学习 API 风格

Lukasz Dynowski, Marcin Dulak
《高性能 Python》第二版

《高性能 Python》第二版

Micha Gorelick, Ian Ozsvald

Publisher Resources

ISBN: 9798341656598