Skip to Content
面向数据科学家的实用统计学
book

面向数据科学家的实用统计学

by Peter Bruce, Andrew Bruce
October 2018
Beginner to intermediate
238 pages
6h 32m
Chinese
Posts & Telecom Press
Content preview from 面向数据科学家的实用统计学
无监督学习
201
7.2.2
 
K
-Means
算法
K
-Means
算法同样可以应用于具有
p
个变量
(
X
1
,
,
X
p
)
的数据集。要让
K
-Means
给出精确
解,计算难度很大,但启发式算法可以高效地计算出局部最优解。
在算法开始时,用户需要指定
K
值和一组初始的类均值,然后重复执行以下步骤。
(1)
根据距离的平方值,将每条记录分配给最近的类均值所在的类。
(2)
根据记录的分配情况,重新计算新的类均值。
一旦记录到类的分配情况不再改变,该算法就收敛。
在开始首次迭代前,需要指定一组初始的类均值。一般做法是将每个记录随机分配给
K
类中的一个,然后计算类均值。
由于该算法并不保证能给出最优解,所以推荐做法是在初始化时使用不同的随机样本多次
运行算法。当使用了多组迭代时,
K
-Means
的结果由类内平方和最低的一组迭代给出。
可以通过设置
R
函数 kmeans nstart 参数,指定随机启动初始化的尝试次数。例如,下
面的代码使用
10
个不同的初始类均值运行
K
-Means
,以找出
5
个类。
syms <- c( 'AAPL', 'MSFT', 'CSCO', 'INTC', 'CVX', 'XOM', 'SLB', 'COP',
'JPM', 'WFC', 'USB', 'AXP', 'WMT', 'TGT', 'HD', 'COST')
df <- sp500_px[row.names(sp500_px)>='2011-01-01', syms]
km <- kmeans(df, centers=5, nstart=10) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
基于Python的智能文本分析

基于Python的智能文本分析

Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

Publisher Resources

ISBN: 9787115493668