Thinking about features

After we download the 20 newsgroups by whatever means we prefer, the data object called groups is now available in the program. The data object is in the form of key-value dictionary. Its keys are as follows:

>>> groups.keys()dict_keys(['description', 'target_names', 'target', 'filenames',  'DESCR', 'data'])  

The target_names key gives the newsgroups names:

>>> groups['target_names']['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', ...

Get Python Machine Learning By Example now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.