Classifying newsgroup topics with SVMs

Finally, it is time to build our state-of-the-art SVM-based newsgroup topic classifier using everything we just learned.

First we load and clean the dataset with the entire 20 groups as follows:

>>> categories = None>>> data_train = fetch_20newsgroups(subset='train',                          categories=categories, random_state=42)>>> data_test = fetch_20newsgroups(subset='test',                          categories=categories, random_state=42)>>> cleaned_train = clean_text(data_train.data)>>> label_train = data_train.target>>> cleaned_test = clean_text(data_test.data)>>> label_test = data_test.target>>> term_docs_train = tfidf_vectorizer.fit_transform(cleaned_train)>>> term_docs_test = tfidf_vectorizer.transform(cleaned_test)

As we have seen that the ...

Get Python Machine Learning By Example - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.