Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
数据降维:使用
PCA
挤压数据
85
PCA
使用线性投影将数据转换到新特征空间。图
6-2c
演示了线性投影。当将
x
投影到
v
上时,投影的长度是这两个向量的内积的一个比例,即通过
v
的范数(向量与它本身的内
积)进行了归一化。然后,限制
v
具有单位范数。这样,唯一重要的部分就是分子了,我
们称其为
v
(见公式
6-1
)。
公式
6-1
 投影坐标
z
=
x
T
v
注意,尽管
x
v
是列向量,但
z
是个标量。因为有很多数据点,所以我们可以构造一个
向量
z
,表示所有数据点在新特征
v
上的投影坐标(见公式
6-2
)。这里,
X
是我们熟悉的
数据矩阵,其中每行都是一个数据点。最终结果
z
是个列向量。
公式
6-2
 投影坐标向量
z
=
Xv
6.2.2
 方差和经验方差
下一步是计算投影的方差。方差是与均值之间的距离的平方的期望值(见公式
6-3
)。
公式
6-3
 随机变量
Z
的方差
Var(
Z
) = E[
Z
– E(
Z
)]
2
还有一个小问题:在我们的问题表示中,从来没有涉及过均值
E(
Z
)
,它是个自由变量。这
个问题的一种解决方法是,从所有数据点中减去均值,从而将其从公式中除掉。这样,结
果数据集的均值就是
0
,这意味着方差就是
Z
2
的期望值。从几何意义上说,减去均值的效
果就是将数据中心化(见图
6-2a
和图
6-2b
)。
一个与方差关系非常紧密的量是两个随机变量
Z
1
Z
2
之间的协方差(见公式
6-4
),可以
将其视为方差(单随机变量)概念在两个随机变量上的推广。
公式
6-4
 两个随机变量
Z
1
Z
2
之间的协方差
Cov(
Z
1
,
Z
2
) = E[(
Z
1
– E(
Z
1
))(
Z
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680