News topic classification with support vector machine

It is finally time to build our state-of-the-art, SVM-based news topic classifier with all we just learned.

Load and clean the news dataset with the whole 20 groups:

>>> categories = None>>> data_train = fetch_20newsgroups(subset='train',                            categories=categories, random_state=42)>>> data_test = fetch_20newsgroups(subset='test',                            categories=categories, random_state=42)>>> cleaned_train = clean_text(data_train.data)>>> label_train = data_train.target>>> cleaned_test = clean_text(data_test.data)>>> label_test = data_test.target>>> term_docs_train =                   tfidf_vectorizer.fit_transform(cleaned_train)>>> term_docs_test = tfidf_vectorizer.transform(cleaned_test)

Recall that the linear kernel is ...

Get Python Machine Learning By Example now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.