Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
145
附录A
线性建模与线性代数基础
A.1
 线性分类概述
当有个标记好的数据集时,特征空间中散布着不同类别的数据点,分类器的工作就是将不
同类别的点分隔开。分类器的实现方式是对不同类别的数据点生成不同的输出。例如,当
只有两个类别时,一个好的分类器应该对一个类别生成较大的输出,而对另一个类别生成
较小的输出。那些正好位于两个类别之间的点可以形成一个
决策面
(见图
A-1
)。
特征
1
+
+
+
+
+
+
+
+
+
特征
2
A-1:简单二值分类找出一个界面来分隔两个类别的数据点
可以使用很多函数作为分类器。由于若干原因,应该使用能清晰划分类别的最简单的函
数。首先,相比最好的复杂分隔符,找到最好的简单分隔符更容易。其次,简单函数通常
146
附录
A
在新数据上的扩展效果更好,因为要使新数据适合过于复杂的训练数据是非常困难的(这
就是
过拟合
的概念)。简单模型或许会犯错误,比如在图
A-1
中,有些点在错误的一侧。
但是,我们宁愿牺牲一些训练准确度,来换取更加简单的、能得到更高测试准确度的决策
面。这种使复杂度最小而可用性最大的原则称为“奥卡姆剃刀”,在科学界和工程界都广
泛适用。
最简单的函数是一条直线。带有一个输入变量的
线性函数
是最常见的(见图
A-2
)。
特征1
函数
输出
1
2
3
A-2:一个输入变量的线性函数
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680