Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
28
2
2-18 原始及缩放后的新闻文章单词数量。注意只有
x
轴的尺度发生了变化,特征缩放后的分布
形状保持不变
当一组输入特征的尺度相差很大时,就需要进行特征缩放。例如,一个人气很高的商业网
站的日访问量可能是几十万次,而实际购买行为可能只有几千次。如果这两个特征都被模
型所使用,那么模型就需要在确定如何使用它们时先平衡一下尺度。如果输入特征的尺度
差别非常大,就会对模型训练算法带来数值稳定性方面的问题。在这种情况下,就应该对
特征进行标准化。第
4
章将详细介绍特征缩放在自然文本处理中的应用,并给出几个使用
示例。
2.5
 交互特征
两个特征的乘积可以组成一对简单的
交互特征
,这种相乘关系可以用逻辑操作符
AND
类比,它可以表示出由一对条件形成的结果:“该购买行为来自于邮政编码为
98121
的地
区”
AND
“用户年龄在
18
35
岁之间”。这种特征在基于决策树的模型中极其常见,在
广义线性模型中也经常使用。
简单线性模型使用独立输入特征
x
1
,
x
2
,
,
x
n
的线性组合来预测结果变量
y
11 2 2 nn
y wx wx w x= + ++
简单而又奇妙的数值
29
很容易对线性模型进行扩展,使之包含输入特征的两两组合,如下所示:
11 22 1,111 1,2121,3 13nn
y
wx wx
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680