Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
机器学习流程
3
数学公式
将数值型的变量联系起来,但原始数据经常不是数值型的。(“爱丽丝在星期三购
买了《指环王》三部曲”
这一行为就不是数值型的,她随后对这本书发表的评价也不是数
值型的。)必须有个什么东西将这二者联系起来,这就是特征的用武之地了。
1.4
 特征
特征
是原始数据的数值表示。有多种方法可以将原始数据转换为数值型的表示,所以特征
可以有多种形式。当然,特征必须采用可用的数据类型。事实上,特征还和模型相关联,
这一点可能并不那么显而易见。有些模型更适合使用某种类型的特征,反之亦然。正确的
特征应该适合当前的任务,并易于被模型所使用。
特征工程
就是在给定数据、模型和任务
的情况下设计出最合适的特征的过程。
特征的数量也非常重要。如果没有足够的有信息量的特征,那么模型将不能完成最终的任
务。如果特征过多,或者多数特征不合适,那么模型将很难训练而且训练成本高昂。在训
练过程中可能会出现一些影响模型性能的错误。
1.5
 模型评价
特征和模型位于原始数据和我们想得到的知识之间(见图
1-2
)。在机器学习流程中,我
们要选择的不仅是模型,还有特征。模型与特征相辅相成,对其中一个的选择会影响另一
个。好的特征可以使随后的建模步骤更容易,最后得出的模型也更能完成所需的任务。坏
的特征要想达到同等性能,则需要复杂得多的模型。在本书后面的内容中,我们将介绍各
种不同类型的特征,并讨论它们对于不同类型的数据和模型的优缺点。闲话少说,我们开
始吧!
原始
数据
选择与合并
特征 知识
建模
.
.
.
清洗与
转换
数据源1
数据源2
数据源 n
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680