Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
82
6
数据降维:使用
PCA
挤压数据
通过自动数据收集和特征生成技术,可以快速获取大量特征,但不是所有特征都是有用
的。第
3
章和第
4
章讨论了基于频率的过滤技术和特征缩放技术,它们可以作为消除无用
特征的手段。下面仔细研究一下使用
主成分分析
principal component analysis
PCA
)来
降低特征维度这个问题。
从本章开始,我们要研究基于模型的特征工程技术。在此之前介绍的大部分技术可以在不
考虑数据的情况下进行定义。例如,基于频率的过滤可以表述为“除去所有小于
n
的计
数”,不用考虑数据本身的更多性质,就可以实现这种技术。
与之不同的是,基于模型的技术需要来自于数据的信息。例如,
PCA
要根据数据的主轴
进行定义。在前面的章节中,数据、特征和模型之间总是有一个明确的界限,但从本章开
始,它们之间的区别会变得越来越模糊。这也正是当前特征学习研究的热点所在。
6.1
 直观理解
数据降维就是在保留重要信息的同时消除那些“无信息量的信息”。“无信息量”有多种定
义方法,
PCA
关注的是线性相关性。在附录
A.2
节中,我们将数据矩阵的列空间描述为所
有特征向量的生成空间。如果列空间的秩小于特征总数,那么多数特征就是几个关键特征
的线性组合。线性相关的特征是对空间和计算能力的浪费,因为它们包含的信息可以从更
少的几个特征中推导出来。为了避免这种情况,
PCA
试图将数据挤压到一个维度大大小于
原空间的线性子空间,从而消除这些“臃肿”。
我们在图中画出特征空间中的数据点集合。每个圆点表示一个数据点,整个数据点集合形 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680