Skip to Content
Python机器学习手册:从数据预处理到深度学习
book

Python机器学习手册:从数据预处理到深度学习

by Chris Albon
July 2019
Intermediate to advanced
365 pages
8h 13m
Chinese
Publishing House of Electronics Industry
Content preview from Python机器学习手册:从数据预处理到深度学习
70
4
处理数值型数据
通过设置
interaction_only
True
,可以强制创建出来的特征只包含交互特征
interaction = PolynomialFeatures(degree=2,
interaction_only=True, include_bias=False)
interaction.fit_transform(features)
array([[ 2., 3., 6.],
[ 2., 3., 6.],
[ 2., 3., 6.]])
讨论
当特征和目标值(预测值)之间存在非线性关系时,就需要创建多项式特征。例如,年
龄可能和身体状况有很大的关系,一般年龄越大身体状况越差,但是它们之间的关系不
是线性关系,所以需要对特征
x
编码——生成这个特征的更高阶的形式(
x
2
x
3
等)——
以此表示对目标值造成的非线性影响。
另外,有时我们会遇到一个特征需要依赖于另一个特征才能对目标值造成影响的情况。
举一个简单的例子,要预测一杯咖啡是否是甜的,要考虑两个特征 :咖啡是否被搅拌过
咖啡是否加了糖。单独看二者中的任何一个,都无法确定这杯咖啡是甜的,但是把它们
结合在一起,就能确定了。咖啡只有在加糖并且搅拌后才是甜的,这两个特征对目标值
(咖啡是甜的)的作用是相互依赖的。生成一个交互特征(将两个特征相乘),我们就可
以为这种关系编码。
4.5
 转换特征
问题描述
对一个或多个特征进行自定义的转换。
解决方案
scikit-learn
中,使用
FunctionTransformer ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通特征工程

精通特征工程

Alice Zheng, Amanda Casari
精通機器學習

精通機器學習

Aurélien Géron
Python数据分析基础

Python数据分析基础

Clinton W. Brownley

Publisher Resources

ISBN: 9787121369629