Skip to Content
Machine Learning in Java - Second Edition
book

Machine Learning in Java - Second Edition

by AshishSingh Bhatia, Bostjan Kaluza
November 2018
Intermediate to advanced
300 pages
7h 42m
English
Packt Publishing
Content preview from Machine Learning in Java - Second Edition

BBC dataset

In 2006, Greene and Cunningham collected the BBC dataset to study a particular document—Clustering challenge using support vector machines. The dataset consists of 2,225 documents from the BBC News website from 2004 to 2005, corresponding to the stories collected from five topical areas: business, entertainment, politics, sport, and technology. The dataset can be seen at the following website: http://mlg.ucd.ie/datasets/bbc.html.

We can download the raw text files under the Dataset: BBC section. You will also notice that the website contains an already processed dataset, but, for this example, we want to process the dataset by ourselves. The ZIP contains five folders, one per topic. The actual documents are placed in the corresponding ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Java Machine Learning

Mastering Java Machine Learning

Uday Kamath, Krishna Choppella
Java: Data Science Made Easy

Java: Data Science Made Easy

Richard M. Reese, Jennifer L. Reese, Alexey Grigorev

Publisher Resources

ISBN: 9781788474399Supplemental Content