Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
数据降维:使用
PCA
挤压数据
87
6.2.6
 特征转换
一旦找到了主成分,就可以使用线性投影对特征进行转换。令
X
=
UΣV
T
X
的奇异值分
解,且
V
k
是列中包含前
k
个左奇异向量的矩阵。
X
的维度是
n
×
d
,其中
d
是初始特征的
数量,
V
k
的维度是
d
×
k
。除了像公式
6-2
中那样在单向量上进行投影,还可以使用投影矩
阵在多个向量上同时进行投影(见公式
6-9
)。
公式
6-9
 
PCA
投影矩阵
W
=
V
k
投影坐标矩阵很容易计算,而且可以通过奇异向量彼此正交这一性质进一步简化(见公
6-10
)。
公式
6-1
0
 简单
PCA
转换
Z
=
XW
=
XV
k
=
UΣV
T
V
k
=
U
k
Σ
k
投影值就是前
k
个右奇异向量乘以前
k
个奇异值。因此,
PCA
的一整套解、成分和投影都
可以方便地通过
X
的奇异值分解得出。
6.2.7
 
PCA
实现
在很多
PCA
推导中,首先要对数据进行中心化,然后进行协方差矩阵的特征值分解。但
PCA
最容易的实现方法是对中心化后的数据矩阵进行奇异值分解。
PCA
实现步骤
(1)
数据矩阵中心化:
C
=
X
1μ
T
其中
1
是个全
1
列向量,
μ
是由
X
中每行的平均值组成的列向量。
(2)
计算
SVD
C
=
UΣV
T
(3)
找出主成分。前
k
个主成分是
V
的前
k
列,也就是对应于
k
个最大奇异值的右奇异
向量。
(4)
转换数据。转换后的数据就是
U
的前
k
列。(如果需要白化,就用奇异值的倒数乘
以向量。这要求选择的奇异值不是
0
。参见
6.4
。)
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680