Pattern mining on MSNBC clickstream data

Having spent a considerable amount of time explaining the basics of pattern mining, let's finally turn to a more realistic application. The data we will be discussing next comes from server logs from http://msnbc.com (and in parts from http://msn.com, when news-related), and represents a full day's worth of browsing activity in terms of page views of users of these sites. The data collected in September 1999 and has been made available for download at http://archive.ics.uci.edu/ml/machine-learning-databases/msnbc-mld/msnbc990928.seq.gz. Storing this file locally and unzipping it, the msnbc990928.seq file essentially consists of a header and space-separated rows of integers of varying length. The following ...

Get Mastering Machine Learning with Spark 2.x now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.