Skip to Content
Python机器学习基础教程
book

Python机器学习基础教程

by Andreas C. Müller, Sarah Guido
January 2018
Intermediate to advanced
301 pages
8h 54m
Chinese
Posts & Telecom Press
Content preview from Python机器学习基础教程
无监督学习与预处理
159
这里聚类挑选出的似乎是“深色皮肤且微笑”“有领子的衬衫”“微笑的女性”“萨达姆”
和“高额头”。如果进一步详细分析,我们还可以利用树状图找到这些高度相似的簇。
3.5.5
 聚类方法小结
本节的内容表明,聚类的应用与评估是一个非常定性的过程,通常在数据分析的探索阶
段很有帮助。我们学习了三种聚类算法:
k
均值、
DBSCAN
和凝聚聚类。这三种算法都
可以控制聚类的粒度(
granularity
)。
k
均值和凝聚聚类允许你指定想要的簇的数量,而
DBSCAN
允许你用 eps 参数定义接近程度,从而间接影响簇的大小。三种方法都可以用于
大型的现实世界数据集,都相对容易理解,也都可以聚类成多个簇。
每种算法的优点稍有不同。
k
均值可以用簇的平均值来表示簇。它还可以被看作一种分解
方法,每个数据点都由其簇中心表示。
DBSCAN
可以检测到没有分配任何簇的“噪声点”,
还可以帮助自动判断簇的数量。与其他两种方法不同,它允许簇具有复杂的形状,正如我
们在
two_moons
的例子中所看到的那样。
DBSCAN
有时会生成大小差别很大的簇,这可能
是它的优点,也可能是缺点。凝聚聚类可以提供数据的可能划分的整个层次结构,可以通
过树状图轻松查看。
3.6
 小结与展望
本章介绍了一系列无监督学习算法,可用于探索性数据分析和预处理。找到数据的正确表
示对于监督学习和无监督学习的成功通常都至关重要,预处理和分解方法在数据准备中具
有重要作用。
分解、流形学习和聚类都是加深数据理解的重要工具,在没有监督信息的情况下,也是理
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据驱动力:企业数据分析实战

数据驱动力:企业数据分析实战

Carl Anderson
Python应用开发指南

Python应用开发指南

Posts & Telecom Press, Ninad Sathaye
管理Kubernetes

管理Kubernetes

Brendan Burns, Craig Tracey

Publisher Resources

ISBN: 9787115475619