Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
非线性特征化与
k
-
均值模型堆叠
107
k
-
均值特征化适合实数型、有界的、能在空间中形成块状密集区域的数值特征。块状区域
可以是任意形状,因为我们可以增加簇的数量来近似它们。(与经典聚类方法不同,我们
不关心如何找出簇的“真实”数目,而只需覆盖它们。)
k
-
均值不能处理欧氏距离无效的特征空间,即分布奇特的数值型变量或分类变量。如果特
征集合中包括这种变量,那么有以下几种处理方法。
(1)
仅在实数型、有界的数值特征上应用
k
-
均值特征化。
(2)
自定义一种度量方式,用来处理多种数据类型,并使用
k
-
中心点算法。(
k
-
中心点是一
种与
k
-
均值类似的方法,允许使用任意的度量方式。)
(3)
先将分类变量转换为分箱计数统计量(见
5.2.2
节),再使用
k
-
均值对其进行特征化。
与处理分类变量和时间序列的技术相结合,
k
-
均值特征化可以用来处理经常出现在像客户
营销和销售分析这种情境下的大批量数据。最后得到的簇可以看作对用户的细分,这在随
后的建模阶段是非常有用的特征。
7.5
 小结
本章介绍了模型堆叠的概念,使用的是一种非传统的方法:将有监督的
k
-
均值聚类和简单
的线性分类器结合起来。
k
-
均值通常被用作无监督建模方法,目的是在特征空间中找出数
据点的密集簇。但在本章中,可以有选择地为
k
-
均值提供类标签作为输入,这有助于
k
-
值找到与类别边界更加对齐的簇。
下一章将讨论深度学习,它通过将各层神经网络彼此叠加在一起,将模型堆叠提高到了一
个新的水平。
ImageNet
大规模视觉识别竞赛近期的两位优胜者使用了
13
层和
22
层的神
经网络。他们利用了现有的大量未标记训练图像,从中找寻能得到良好图像特征的像素组 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680