book

Python Machine Learning By Example - Second Edition

by Yuxi (Hayden) Liu

February 2019

Beginner to intermediate

382 pages

10h 1m

English

Packt Publishing

Read now

Unlock full access

Content preview from Python Machine Learning By Example - Second Edition

Getting the newsgroups data

The first project in this book is about the 20 newsgroups dataset. It's composed of text taken from newsgroup articles, as its name implies. It was originally collected by Ken Lang and now has been widely used for experiments in text applications of machine learning techniques, specifically NLP techniques.

The data contains approximately 20,000 documents across 20 online newsgroups. A newsgroup is a place on the internet where people can ask and answer questions about a certain topic. The data is already cleaned to a certain degree and already split into training and testing sets. The cutoff point is at a certain date.

The original data comes from http://qwone.com/~jason/20Newsgroups/, with 20 different topics ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Python Machine Learning by Example - Third Edition

Publisher Resources

ISBN: 9781789616729Supplemental Content

Python Machine Learning By Example - Second Edition

by Yuxi (Hayden) Liu

Getting the newsgroups data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Python Machine Learning by Example - Third Edition

Python Machine Learning

Machine Learning with Python Cookbook

Python Machine Learning, Second Edition - Second Edition

Publisher Resources