Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
数据降维:使用
PCA
挤压数据
89
分解释了数据集中大约
40%
的总方差,这个结果绝对算不上很好,但它可以让我们在低维
度上很方便地进行可视化。可以看到,
PCA
可以将相似的数字紧密地组织在一起。
0
6
位于同一区,
1
7
3
9
也是同样的情况。空间被大致划分为
0
4
6
在一侧,其余数
字在另一侧。
6-3MNIST 数据子集的 PCA 投影——标记对应于图像标签
因为数字之间还有相当数量的重叠,所以在投影空间中使用线性分类器将数字区分开来还
是很困难的。因此,如果我们的任务是区分手写数字,而且选择的模型是线性分类器的
话,那么只使用前
3
个主成分作为特征是不够的。尽管如此,看看一个
64
维的数据集是
如何被转换到三维空间中还是很有趣的。
6.4
 白化与
ZCA
由于目标函数的正交限制,
PCA
转换有一个非常好的副作用:转换后的特征都是不相关
的。换句话说,每对特征向量之间的内积都是
0
。使用奇异向量的正交性质可以很容易地
进行证明:
Z
T
Z
=
Σ
k
U
k
T
U
k
Σ
k
=
Σ
k
2
结果是个对角阵,对角线上是奇异值的平方,表示每个特征向量与自己的相关度,也称为
2
范数。
有时候,还应该通过归一化把特征的长度变为
1
。在信号处理领域,这种操作称为
白化
这样做可以得到一组特征,彼此之间的相关度为
0
,与自身的相关度为
1
。数学上,将
PCA
转换乘以奇异值的倒数,就可以实现白化(见公式
6-11
)。
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680