O'Reilly logo

Mastering Predictive Analytics with R - Second Edition by Rui Miguel Forte, James D. Miller

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Modeling the topics of online news stories

To see how topic models perform on real data, we will look at two datasets containing articles originating from BBC News during the period of 2004-2005. The first dataset, which we will refer to as the BBC dataset, contains 2,225 articles that have been grouped into five topics. These are business, entertainment, politics, sports, and technology.

The second dataset, which we will call the BBCSports dataset, contains 737 articles only on sports. These are also grouped into five categories according to the type of sport being described. The five sports in question are athletics, cricket, football, rugby, and tennis. Our objective will be to see if we can build topic models for each of these two datasets ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required