Skip to Content
Machine Learning: End-to-End guide for Java developers
book

Machine Learning: End-to-End guide for Java developers

by Richard M. Reese, Jennifer L. Reese, Boštjan Kaluža, Dr. Uday Kamath, Krishna Choppella
October 2017
Intermediate to advanced
1159 pages
26h 10m
English
Packt Publishing
Content preview from Machine Learning: End-to-End guide for Java developers

Topic modeling for BBC news

As discussed earlier, the goal of topic modeling is to identify patterns in a text corpus that correspond to document topics. In this example, we will use a dataset originating from BBC news. This dataset is one of the standard benchmarks in machine learning research, and is available for non-commercial and research purposes.

The goal is to build a classifier that is able to assign a topic to an uncategorized document.

BBC dataset

Greene and Cunningham (2006) collected the BBC dataset to study a particular document-clustering challenge using support vector machines. The dataset consists of 2,225 documents from the BBC News website from 2004 to 2005, corresponding to the stories collected from five topical areas: business, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

DevOps Tools for Java Developers

DevOps Tools for Java Developers

Stephen Chin, Melissa McKay, Ixchel Ruiz, Baruch Sadogursky

Publisher Resources

ISBN: 9781788622219Supplemental Content