O'Reilly logo

Python Machine Learning By Example - Second Edition by Yuxi Liu

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Clustering newsgroups data using k-means

Up to this point, you should be very familiar with k-means clustering. Let's see what we are able to mine from the newsgroups dataset using this algorithm. We, herein, use all data from four categories as an example.

We first load the data from those newsgroups and preprocess it as we did in Chapter 2, Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms:

>>> from sklearn.datasets import fetch_20newsgroups>>> categories = [...     'alt.atheism',...     'talk.religion.misc',...     'comp.graphics',...     'sci.space',... ]>>> groups = fetch_20newsgroups(subset='all',                                     categories=categories)>>> labels = groups.target>>> label_names = groups.target_names>>> def is_letter_only(word):... for char ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required