July 2018
Beginner to intermediate
406 pages
9h 55m
English
We should not expect perfect clustering in the sense that posts from the same newsgroup (for example, comp.graphics) are also clustered together. An example will give us a quick impression of the noise that we have to expect. For the sake of simplicity, we will focus on one of the shorter posts:
>>> post_group = zip(train_data.data, train_data.target) >>> all = [(len(post[0]), post[0], train_data.target_names[post[1]]) for post in post_group] >>> graphics = sorted([post for post in all if post[2]=='comp.graphics']) >>> print(graphics[5]) (245, 'From: SITUNAYA@IBM3090.BHAM.AC.UKnSubject: test....(sorry)nOrganization: The University of Birmingham, United KingdomnLines: 1nNNTP-Posting-Host: ibm3090.bham.ac.uk<...snip...>', ...
Read now
Unlock full access