Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
简单而又奇妙的数值
31
这样的模型,有些特征选择技术需要训练不止一个待选模型。换言之,特征选择不是为了减
少训练时间(实际上,一些技术会增加总体训练时间),而是为了减少模型评分时间。
粗略地说,特征选择技术可以分为以下三类。
过滤
过滤技术对特征进行预处理,以除去那些不太可能对模型有用处的特征。例如,我们可
以计算出每个特征与响应变量之间的相关性或互信息,然后过滤掉那些在某个阈值之下
的特征。第
3
章将讨论用于文本特征的这种技术。过滤技术的成本比下面描述的打包技
术低廉得多,但它们没有考虑我们要使用的模型,因此,它们有可能无法为模型选择出
正确的特征。我们最好谨慎地使用预过滤技术,以免在有用特征进入到模型训练阶段之
前不经意地将其删除。
打包方法
这些技术的成本非常高昂,但它们可以试验特征的各个子集,这意味着我们不会意外地
删除那些本身不提供什么信息但和其他特征组合起来却非常有用的特征。打包方法将模
型视为一个能对推荐的特征子集给出合理评分的黑盒子。它们使用另外一种方法迭代地
对特征子集进行优化。
嵌入式方法
这种方法将特征选择作为模型训练过程的一部分。例如,特征选择是决策树与生俱来的
一种功能,因为它在每个训练阶段都要选择一个特征来对树进行分割。另一个例子是
1
正则项,它可以添加到任意线性模型的训练目标中。
1
正则项鼓励模型使用更少的特
征,而不是更多的特征,所以又称为模型的稀疏性约束。嵌入式方法将特征选择整合为
模型训练过程的一部分。它们不如打包方法强大,但成本也远不如打包方法那么高。与
过滤技术相比,嵌入式方法可以选择出特别适合某种模型的特征。从这个意义上说,嵌 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680