Exploring the newsgroups data

After we download the 20 newsgroups dataset by whatever means we prefer, the data object of groups is now cached in memory. The data object is in the form of key-value dictionary. Its keys are as follows:

>>> groups.keys()dict_keys(['data', 'filenames', 'target_names', 'target', 'DESCR'])

The target_names key gives the newsgroups names:

>>> groups['target_names'] ['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', ...

Get Python Machine Learning By Example - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.