Skip to Content
数据科学中的实用统计学(第2版)
book

数据科学中的实用统计学(第2版)

by Peter Bruce, Andrew Bruce, Peter Gedeck
October 2021
Intermediate to advanced
289 pages
8h 31m
Chinese
Posts & Telecom Press
Content preview from 数据科学中的实用统计学(第2版)
统计机器学习
211
30
min_samples_split
的变化范围设定为
20
100
scikit-learn
中的
GridSearchCV
法可以非常方便地在这些参数的所有组合上进行交叉验证,实现完全搜索,并通过交叉验
证模型的表现选出最优的参数集合。
6.2.5
 预测连续的值
要使用树预测连续的值(也称为
回归
),遵循的逻辑和过程与前面基本相同,只是在测量
不纯度时,要基于每次分割时的均方偏差(均方误差),并通过均方根误差(
RMSE
)(
4.2.2
节)来判断预测效果。
scikit-learn
中的
sklearn.tree.DecisionTreeRegressor
方法可以训练决策树回归模型。
6.2.6
 如何使用树
在组织中进行预测建模时,建模人员面临的一个主要困难是,他们所用的方法本质上是一
种“黑箱”,这使得建模方法很难与组织中的其他元素协同配合。从这个意义上说,树模
型具有两个非常突出的优点。
树模型提供了一种探索数据的可视化工具,它可以表示出哪个变量是重要的,以及变量
之间是如何关联的。树可以捕获预测变量之间的非线性关系。
树模型提供了一系列规则。可以与非专业人士有效地交流这些规则,这既有助于一个数
据挖掘项目的实现,也有助于将这个项目“卖”出去。
使用多棵树进行预测的效果通常比只用一棵树要好。特别地,随机森林和提升树算法几乎
总是能提供更高的预测准确度和更好的性能(参见
6.3
节与
6.4
节)
,但会丢失前面所说的
单棵树的优点。
本节要点
决策树生成一组规则来分类或预测一个结果。
• ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9787115569028