Clustering the countries

We'll now apply the k-means algorithm to cluster the countries together:

>>> km = KMeans(3, init='k-means++', random_state = 3425) # initialize
>>> km.fit(df.values)
>>> df['countrySegment'] = km.predict(df.values)
>>> df[:5]

After the preceding code is executed we'll get the following output:

Clustering the countries

Let's find the average GDP per capita for each country segment:

>>> df.groupby('countrySegment').GDPperCapita.mean()
>>> countrySegment
0    13800.586207
1     1624.538462
2    29681.625000
Name: GDPperCapita, dtype: float64

We can see that cluster 2 has the highest average GDP per capita and we can assume that this includes developed countries. ...

Get Mastering Python for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.