Skip to Content
数据科学中的实用统计学(第2版)
book

数据科学中的实用统计学(第2版)

by Peter Bruce, Andrew Bruce, Peter Gedeck
October 2021
Intermediate to advanced
289 pages
8h 31m
Chinese
Posts & Telecom Press
Content preview from 数据科学中的实用统计学(第2版)
无监督学习
257
mclust
得到的簇可能有些出乎意料,但它们确实反映了这种算法的统计学本质。基于模型
的聚类方法的目标是找出多元正态分布的最佳拟合集合。从图
7-10
中的等高线来看
,股
票数据似乎具有一个正态分布的形状,但实际上,股票收益是一个长尾分布,不是正态分
布。为了处理这个问题,
mclust
先对数据主体拟合了一个分布,随后又拟合了第二个方差
更大的分布。
7.4.3
 选择簇的数量
K-
均值和层次聚类不同,
mclust
可以自动选择簇的数量(在这个例子中,簇的数目是
2
,它的方法是选择能使
贝叶斯信息准则
Bayesian Information Criteria
BIC
)有最大值
的簇数目(
BIC
AIC
似,参见
4.2.4
)。
BIC
的原理是选择带有惩罚项的最佳拟合模
型,这个惩罚项是关于模型中参数数量的。在基于模型的聚类中,加入更多簇总是会改善
拟合效果,但代价是在模型中引入了更多的参数。
请注意,在多数情况下,
BIC
要最小化。
mclust
包的作者决定使用符号相反
BIC
,是为了使解释图形变得更加容易。
mclust
拟合
14
种模型,每种模型中簇的数量是依次递增的,然后自动选择一个最优模型。
可以使用
mclust
中的一个函数绘制出这些模型的
BIC
值:
plot(mcl, what='BIC', ask=FALSE)
簇的数目,也就是不同多元正态模型(簇)的数量,显示在
x
轴上(见图
7-12
)。
7-1214 种股票收益数据模型的 BIC 值,模型中簇的数目依次递增
258
7
Python ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9787115569028