Sequential pattern mining with prefix span

Turning to sequential pattern matching, the prefix span algorithm is a little more complicated than association rules, so we need to take a step back and explain the basics first. Prefix span has first been described in http://hanj.cs.illinois.edu/pdf/tkde04_spgjn.pdf as a natural extension of the so-called FreeSpan algorithm. The algorithm itself represents a notable improvement over other approaches, such as Generalized Sequential Patterns (GSP). The latter is based on the apriori principle and all the drawbacks we discussed earlier regarding many algorithms based on it carry over to sequential mining as well, that is, expensive candidate generation, multiple database scans, and so on.

Prefix span, ...

Get Mastering Machine Learning with Spark 2.x now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.