Skip to Content
基于scikit-learn和PyTorch的实践机器学习
book

基于scikit-learn和PyTorch的实践机器学习

by Aurélien Géron
October 2025
Intermediate to advanced
878 pages
12h 53m
Chinese
O'Reilly Media, Inc.
Content preview from 基于scikit-learn和PyTorch的实践机器学习

第 5 章 决策树 决策树

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

决策树 是一种通用的机器学习算法,既能执行分类任务,也能执行回归任务,甚至还能执行多输出任务。它们是功能强大的算法,能够适应复杂的数据集。例如,在第 2 章中,你在加利福尼亚州住房数据集上训练了一个DecisionTreeRegressor 模型,完全拟合(实际上是过度拟合)。

决策树也是随机森林的基本组成部分(参见第 6 章),是当今最强大的机器学习算法之一。

在本章中,我们将首先讨论如何使用决策树进行训练、可视化和预测。然后,我们将介绍 Scikit-Learn 使用的 CART 训练算法,并探讨如何对树进行正则化并将其用于回归任务。最后,我们将讨论决策树的一些局限性。

决策树的训练和可视化

为了让 理解决策树,让我们构建一棵决策树,看看它是如何进行预测的。下面的代码会在虹膜数据集(参见第 4 章)上训练DecisionTreeClassifier

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris(as_frame=True)
X_iris = iris.data[["petal length (cm)", "petal width (cm)"]].values
y_iris = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2, random_state=42)
tree_clf.fit(X_iris, y_iris)

首先,使用export_graphviz() 函数输出名为iris_tree.dot 的图形定义文件,就可以将训练好的决策树可视化:

from sklearn.tree import export_graphviz

export_graphviz(
        tree_clf,
        out_file="iris_tree.dot",
        feature_names=["petal length (cm)", "petal width (cm)"],
        class_names=iris.target_names,
        rounded=True,
        filled=True
    )

然后使用graphviz.Source.from_file() 在 Jupyter 笔记本中加载并显示该文件:

from graphviz import Source

Source.from_file("iris_tree.dot")

Graphviz 是一款开源图形可视化软件包。它还包含一个dot 命令行工具,用于将.dot文件转换为 PDF 或 PNG 等多种格式。

您的第一个决策树看起来如图 5-1 所示

A diagram of a decision tree for classifying iris species based on petal length and width, showing split nodes and leaf nodes with classification results for setosa, versicolor, and virginica.
图 5-1. 虹膜决策树

进行预测

让我们 来看看图 5-1所示的决策树是如何进行预测的。假设你发现了一朵鸢尾花,并想根据它的花瓣对其进行分类。您可以从根节点 (深度 0,位于顶端)开始:该节点询问花瓣的长度是否小于 2.45 厘米。如果是,则向下移动到根节点的左侧子节点(深度 1,左侧)。在这种情况下,它是一个 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

雷达趋势观察:2025年9月

雷达趋势观察:2025年9月

Mike Loukides
编写整洁的Python代码(第2版)

编写整洁的Python代码(第2版)

Posts & Telecom Press, Mariano Anaya
生成式人工智能可视化

生成式人工智能可视化

Priyanka Vergadia, Valliappa Lakshmanan

Publisher Resources

ISBN: 0642572270117