Skip to Content
Python机器学习基础教程
book

Python机器学习基础教程

by Andreas C. Müller, Sarah Guido
January 2018
Intermediate to advanced
301 pages
8h 54m
Chinese
Posts & Telecom Press
Content preview from Python机器学习基础教程
数据表示与特征工程
171
4-2:在分箱特征上比较线性回归和决策树回归
虚线和实线完全重合,说明线性回归模型和决策树做出了完全相同的预测。对于每个箱
子,二者都预测一个常数值。因为每个箱子内的特征是不变的,所以对于一个箱子内的所
有点,任何模型都会预测相同的值。比较对特征进行分箱前后模型学到的内容,我们发
现,线性模型变得更加灵活了,因为现在它对每个箱子具有不同的取值,而决策树模型的
灵活性降低了。分箱特征对基于树的模型通常不会产生更好的效果,因为这种模型可以学
习在任何位置划分数据。从某种意义上来看,决策树可以学习如何分箱对预测这些数据最
为有用。此外,决策树可以同时查看多个特征,而分箱通常针对的是单个特征。不过,线
性模型的表现力在数据变换后得到了极大的提高。
对于特定的数据集,如果有充分的理由使用线性模型——比如数据集很大、维度很高,但
有些特征与输出的关系是非线性的——那么分箱是提高建模能力的好方法。
4.3
 交互特征与多项式特征
想要丰富特征表示,特别是对于线性模型而言,另一种方法是添加原始数据的
交互特征
interaction feature
)和
多项式特征
polynomial feature
)。这种特征工程通常用于统计建模,
但也常用于许多实际的机器学习应用中。
作为第一个例子,我们再看一次图
4-2
。线性模型对
wave
数据集中的每个箱子都学到一个
常数值。但我们知道,线性模型不仅可以学习偏移,还可以学习斜率。想要向分箱数据上
的线性模型添加斜率,一种方法是重新加入原始特征(图中的
x
轴) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据驱动力:企业数据分析实战

数据驱动力:企业数据分析实战

Carl Anderson
Python应用开发指南

Python应用开发指南

Posts & Telecom Press, Ninad Sathaye
管理Kubernetes

管理Kubernetes

Brendan Burns, Craig Tracey

Publisher Resources

ISBN: 9787115475619