O'Reilly logo

Mastering Machine Learning with Spark 2.x by Michal Malohlava, Max Pumperla, Alex Tellez

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Pattern mining on MSNBC clickstream data

Having spent a considerable amount of time explaining the basics of pattern mining, let's finally turn to a more realistic application. The data we will be discussing next comes from server logs from http://msnbc.com (and in parts from http://msn.com, when news-related), and represents a full day's worth of browsing activity in terms of page views of users of these sites. The data collected in September 1999 and has been made available for download at http://archive.ics.uci.edu/ml/machine-learning-databases/msnbc-mld/msnbc990928.seq.gz. Storing this file locally and unzipping it, the msnbc990928.seq file essentially consists of a header and space-separated rows of integers of varying length. The following ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required