March 2018
Intermediate to advanced
272 pages
7h 53m
English
In this example, we're going to use a somewhat famous text classification problem known as the 20 newsgroup problem (http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html). In this problem, we are given 19,997 documents, each belonging to a newsgroup. Our goal is to use the text of the post to predict which newsgroup the text belongs in. For the millennials among us, a newsgroup is sort of the precursor to Reddit (but it's probably closer to the great-great-great grandfather of Reddit). The topics covered in those newsgroups vary greatly and include such topics as politics, religion, and operating systems, all of which you should avoid discussing in polite company. These posts ...