Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
简单而又奇妙的数值
25
mean( )
sqrt(var( ))
xx
x
x
-
=
它先减去特征的均值(对所有数据点),再除以方差,因此又称为
方差缩放
。缩放后的特
征均值为
0
,方差为
1
。如果初始特征服从高斯分布,那么缩放后的特征也服从高斯分布。
2-16
是这种标准化的示意图。
标准化
2-16:特征标准化示意图
不要
中心化
稀疏数据
在稀疏特征上执行
min-max
缩放和标准化时一定要慎重,它们都会从原始特
征值中减去一个量。对于
min-max
缩放,这个平移量是当前特征所有值中的
最小值;对于标准化,这个量是均值。如果平移量不是
0
,那么这两种变换
会将一个多数元素为
0
的稀疏特征向量变成密集特征向量。根据实现方式的
不同,这种改变会给分类器带来巨大的计算负担(按照现在的表示方法,特
征向量中包含没有出现在一篇文档中的所有单词,不用说,这种特征向量变
为密集向量是非常可怕的)。词袋就是一种稀疏的表示方式,大多数分类算
法的实现都针对稀疏输入进行了优化。
2.4.3
 
2
归一化
这种归一化技术是将初始特征值除以一个称为
2
范数的量,
2
范数又称为欧几里得范数,
26
2
它的定义如下:
2
x
x
x
=
2
范数是坐标空间中向量长度的一种测量。它的定义可以根据著名的毕达哥拉斯定理(给
定一个直角三角形两条直角边的长度,可以求出斜边的长度)导出: ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680